participate


Sun Studio C++ - Query on C++ locales/facets
This question is not answered.

<<   Back to Forum  |   Give us Feedback
This topic has 7 replies on 1 page.
LeoCarreon
Posts:12
Registered: 3/17/08
Query on C++ locales/facets   
Jul 17, 2008 4:46 PM
 
 
Hi,

I'm currently using Solaris 10 U5 (x86) and Sun Studio 12 09/07 with all the Sun Studio patches that I am allowed to download applied. The operating system's default locale is set to en_AU.UTF-8.

I have a bit of a problem understanding the behaviour of the various facets in both a C/POSIX/classic locale and a user preferred locale. It is my understanding that in the ISO 14882:1998 or ISO 14882:2003 (C++ Standard), Table 51 (Locale Category Facets) are the standard facets to be provided by a compiler implementation for the classic locale and Table 52 (Required Instantiations) are the facets to be provided by a compiler implementation for the user preferred locale. This is also what Bjarne Stroustrup explains in Appendix D of his book The C++ Programming Language 3rd Edition.

Based on the above, I'm assuming that the definition of the C/POSIX/classic locale is as per chapter 7 (Locale) of ISO 9945-1:2003 (POSIX Base Definitions) while user preferred locales will depend on how these locales are defined by Solaris 10.

My dilemma is when I examine the facets of the C locale using wide characters, I have the following observations:

- The numpunct<wchar_t> facet does not comply with the LC_NUMERIC definition for the POSIX locale.
- The moneypunct<wchar_t, true|false> facet does not comply with the LC_MONETARY definition for the POSIX locale.

Although I'm a bit confused because Stroustrup's book mentioned the values returned by the above facets for the C locale and they also do not comply to the POSIX spec.

For the en_AU.UTF-8 locale using wide characters, I have the following observations:

- The ctype<wchar_t> facet does not have character classifications for characters above 0x7F.
- The numpunct<wchar_t> facet does not comply with Australian conventions.

I also wrote the following test program and it is not behaving as I expected it to when the file it is reading is encoded in ISO 8859-1:

#include <locale>
  #include <fstream>

  using namespace std;

  int main()
  {
    locale::global(locale(""));
    wifstream vFile("ISO_639-2_8859-1.txt");
    vFile.imbue(locale::classic());
    wstring vLineBuff;
    int vLineCount = 0;
    while (vFile.good() == true)
    {
      getline(vFile, vLineBuff);
      if (vFile.fail() == true || vFile.eof() == true)
      {
        break;
      }
      wcout << vLineBuff << endl;
      ++vLineCount;
    }
    vFile.close();
    wcout << "There were " << vLineCount << " lines in the file" << endl;
    return 0;
  }


If I used the above code to read a plain ASCII file, it appears to work correctly.

Any comments on the above will be much appreciated. Does anyone know if there is any documentation available explaining the implementation details of the various facets?

Regards,
Leo
 
clamage45
Posts:3,034
Registered: 6/23/06
Re: Query on C++ locales/facets   
Jul 18, 2008 8:56 AM (reply 1 of 7)  (In reply to original post )
 
 
Are you using the default library (libCstd), or are you using the -library=stlport4 option?
STLport has no support for locales other than the C locale.
 
sebor@roguewave.com
Posts:73
Registered: 7/25/05
Re: Query on C++ locales/facets   
Jul 18, 2008 10:32 AM (reply 2 of 7)  (In reply to original post )
 
 
It would help if you could be more specific about the problems you are seeing with numpunct<wchar_t> and moneypunct<wchar_t> facets.The numpunct.cpp example from the Apache/Rogue Wave C++ Standard Library, for instance, (when adjusted to use the "POSIX" locale and compiled with Sun C++ 5.9 on Solaris 10) produces results that are comparable to the output of running locale -ck LC_NUMERIC:
$ ./a.out 
POSIX locale
Decimal point       = .
Thousands separator = ,
True name           = true
False name          = false

$ LC_ALL=POSIX locale -ck LC_NUMERIC
LC_NUMERIC
decimal_point="."
thousands_sep=""
grouping=-1

The difference between the thousands separators is due to the fact that unlike the C library, the C++ numpunct and moneypunct have no way to indicate the absence of a value. In C, thousands_sep is a char*, and setting it to the empty string ("") indicates that no value has been provided. In C++, numpunct::thousands_sep() returns a char or wchar_t, and there's no such thing as the empty character (NUL would still be a valid "non-empty" character).

As for the inconsistent results returned by the ctype_byname<wchar_t> facet, I was able to reproduce the problem in my environment using the program below. It looks like a bug in the native C++ Standard Library (since STLport doesn't support named locales on Solaris the program aborts with the -library=stlport4 option; ditto for gcc on Solaris with its own implementation of the C++ Standard Library). The program does work as expected with the Apache C++ Standard Library.

#include <clocale>
#include <cstdio>
#include <wctype.h>
#include <locale>

int main (int argc, char *argv[])
{
    const std::locale locale (1 < argc ? argv [1] : "");
    std::locale::global (locale);

    for (int i = 0; i < 256; ++i) {

        const wchar_t wc (i);

        std::printf ("'\\%03o' ('%c')   "
                     "%d:%d  %d:%d  %d:%d  %d:%d  %d:%d  "
                     "%d:%d  %d:%d  %d:%d  %d:%d  %d:%d\n",
                     i, 32 < i && i < 127 ? char (i) : ' ',
                     (std::isalnum)(wc, locale), !!(iswalnum)(i),
                     (std::isalpha)(wc, locale), !!(iswalpha)(i),
                     (std::iscntrl)(wc, locale), !!(iswcntrl)(i),
                     (std::isdigit)(wc, locale), !!(iswdigit)(i),
                     (std::isgraph)(wc, locale), !!(iswgraph)(i),
                     (std::islower)(wc, locale), !!(iswlower)(i),
                     (std::ispunct)(wc, locale), !!(iswpunct)(i),
                     (std::ispunct)(wc, locale), !!(iswpunct)(i),
                     (std::isupper)(wc, locale), !!(iswupper)(i),
                     (std::isxdigit)(wc, locale), !!(iswxdigit)(i));
    }
}
 
LeoCarreon
Posts:12
Registered: 3/17/08
Re: Query on C++ locales/facets   
Jul 18, 2008 9:50 PM (reply 3 of 7)  (In reply to #1 )
 
 
I am using the default library - libCstd.
 
LeoCarreon
Posts:12
Registered: 3/17/08
Re: Query on C++ locales/facets   
Jul 18, 2008 10:18 PM (reply 4 of 7)  (In reply to #2 )
 
 
These are the results I get when I examine the values returned by the various facets:

// Too long to include here but shows classifications for characters 0x00 to 0x7F.
Locale: C
  Facet: ctype<wchar_t>
    Character classifications:

// I'm not sure how compliant these are to the standard.
Locale: C
  Facet: codecvt<wchar_t, char, mbstate_t>
    Code converter characteristics:
      Encoding = 1
      Always no conversion = false
      Length = 12
      Maximum length = 1

// I do get the point about having to set the grouping character.
Locale: C
  Facet: numpunct<wchar_t>
    Numeric punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping =
      True name = true
      False name = false

// According to Stroustrup's book there should be no grouping.
Locale: C
  Facet: moneypunct<wchar_t, false>
    Local monetary punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping = 3
      Currency symbol = 0x00000024
      Positive sign =
      Negative sign = 0x0000002D
      Number of fractional digits = 2
      Positive format = symbol,sign,none,value
      Negative format = symbol,sign,none,value

// Again grouping not expected.
Locale: C
  Facet: moneypunct<wchar_t, true>
    International monetary punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping = 3
      Currency symbol = 0x00000055,0x00000053,0x00000044,0x00000020
      Positive sign =
      Negative sign = 0x0000002D
      Number of fractional digits = 2
      Positive format = symbol,sign,none,value
      Negative format = symbol,sign,none,value

// Too long to include here buts shows classifications for characters 0x00 to 0x7F only.
Locale: en_AU.UTF-8
  Facet: ctype<wchar_t>
    Character classifications:

// Again not sure how compliant these are.
Locale: en_AU.UTF-8
  Facet: codecvt<wchar_t, char, mbstate_t>
    Code converter characteristics:
      Encoding = 1
      Always no conversion = false
      Length = 12
      Maximum length = 1

// Missing true/false name.
Locale: en_AU.UTF-8
  Facet: numpunct<wchar_t>
    Numeric punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping =
      True name =
      False name =

// Looks OK.
Locale: en_AU.UTF-8
  Facet: moneypunct<wchar_t, false>
    Local monetary punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping = 3
      Currency symbol = 0x00000024
      Positive sign =
      Negative sign = 0x0000002D
      Number of fractional digits = 2
      Positive format = sign,symbol,none,value
      Negative format = sign,symbol,none,value

// Looks OK.
Locale: en_AU.UTF-8
  Facet: moneypunct<wchar_t, true>
    International monetary punctuation characteristics:
      Decimal point character = 0x0000002E
      Grouping character = 0x0000002C
      Grouping = 3
      Currency symbol = 0x00000041,0x00000055,0x00000044,0x00000020
      Positive sign =
      Negative sign = 0x0000002D
      Number of fractional digits = 2
      Positive format = symbol,sign,none,value
      Negative format = symbol,sign,none,value


I will be more than happy to supply the code I used to generate the above results but it's a bit lengthy thus I haven't included it here unless you really want to have a look at it.

I do get your point regarding character values though. There is no way to specify an empty value.

Are you saying that the Apache C++ Standard Library with the Sun Studio 12 C++ compiler?
 
sebor@roguewave.com
Posts:73
Registered: 7/25/05
Re: Query on C++ locales/facets   
Jul 19, 2008 2:49 PM (reply 5 of 7)  (In reply to #4 )
 
 
The behavior of the wchar_t specializations of most facets in the C locale is implementation-defined and depends on the native encoding of wchar_t. The idea is that facets such as ctype and codecvt return the same values as the corresponding C library functions, but the exact values aren't specified by C++. For numpunct, the values are specified to be the same for both the char and the wchar_t specializations:
numpunct::decimal_point()   '.'      L'.'
numpunct::thousands_sep()   ','      L','
numpunct::grouping()           string()
numpunct::truename()        "true"   L"true"
numpunct::falsename()       "false"  L"false"

For moneypunct, the values are unspecified except for pos_format() and neg_format(), which are both required to return:
{ symbol, sign, none, value }

in all four specializations of the facet (i.e., moneypunct<char, false>, moneypunct<char, true>, moneypunct<wchar_t, false>, and moneypunct<wchar_t, true>.

For named locales, neither C++ nor C specify the exact values or behavior but they are (obviously) expected to be representative of the specified locales. This usually works well in C but I'm afraid that when it comes to C++ Standard Library implementations, many of them often fall short in some respects (as I already mentioned, of the ones "natively" available on Solaris, only libCstd even attempts to provide any useful behavior at all; both STLport and gcc's libstdc++ simply throw an exception when you try to create a named locale). Unfortunately, even libCstd doesn't provide very robust support for named locales. In particular, its codecvt_byname implementation is next to useless, and as we've already seen, ctype_byname isn't much better. Without these two, any I/O in named locales is going to be of limited use at best (no way to read/write UTF-8 files using fstream, for example).

If you need a robust implementation of C++ locales for Solaris the only one I know of is the Apache C++ Standard Library (formerly Rogue Wave's libstd). It compiles and is usable with several versions of Sun C++ including 5.9 (Sun Studio 12), as well as most other compilers on other popular operating systems. You can get it from the Download page on the stdcxx site. Building it is quite easy (see the README for instructions) and if you run into problems feel free to ask on the user@stdcxx.apache.org mailing list. You should also feel free to send us the source code for your test program if you'd like to see the results with stdcxx.

If using an alternate C++ Standard Library isn't an option for you I recommend using the C library. You'll get much more predictable results that way.
 
LeoCarreon
Posts:12
Registered: 3/17/08
Re: Query on C++ locales/facets   
Jul 23, 2008 5:00 AM (reply 6 of 7)  (In reply to #5 )
 
 
Is there any plans for Sun to provide a more robust Standard C++ Library especially the classes to do with locales and facets?
 
clamage45
Posts:3,034
Registered: 6/23/06
Re: Query on C++ locales/facets   
Jul 23, 2008 8:03 AM (reply 7 of 7)  (In reply to #6 )
 
 
The default library, libCstd, is provided for compatibility across all C++ 5.x releases, and is shipped as part of Solaris. As explained in the C++ FAQ section on library compatibility
http://developers.sun.com/sunstudio/documentation/ss12/mr/READMEs/c++_faq.html#LibComp
it lacks some features required by the C++ standard.

Sun Studio also comes with an STLport implementation of the standard library, which is quite standard-conforming except for lacking support for locales. (I think the latest update of STLport still does not have locale support, but it's been a couple of months since I last checked.)

Until recently, there was no good answer for a standard-conforming library with locale support, other than commercial libraries like Rogue Wave or Dinkumware. But now you can get the Apache stdcxx library, the Rogue Wave library placed in Open Source. This library supports Sun Studio.
http://stdcxx.apache.org/
The Sun Studio C++ Users Guide explains how to use a 3rd-party library replacement like stdcxx in section 12.7 "Replacing the C++ Standard Library"
http://docs.sun.com/app/docs/doc/819-5267/6n7c46e4p?a=view

In a future release, we plan to provide a standard-conforming library as part of Sun Studio.
 
This topic has 7 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 28
  • Guests : 129

About Sun forums
  • Oracle Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Oracle Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums