Read file encoding .net




















The following example illustrates character replacement for the Unicode string from the previous example. NET includes the EncoderReplacementFallback and DecoderReplacementFallback classes, which substitute a replacement string if a character does not map exactly in an encoding or decoding operation. By default, this replacement string is a question mark, but you can call a class constructor overload to choose a different string.

Typically, the replacement string is a single character, although this is not a requirement. You can also implement a replacement class for an encoding. However, you are free to choose any replacement string, and it can contain multiple characters. Instead of providing a best-fit fallback or a replacement string, an encoder can throw an EncoderFallbackException if it is unable to encode a set of characters, and a decoder can throw a DecoderFallbackException if it is unable to decode a byte array.

To throw an exception in encoding and decoding operations, you supply an EncoderExceptionFallback object and a DecoderExceptionFallback object, respectively, to the Encoding. You can also implement a custom exception handler for an encoding operation. The EncoderFallbackException and DecoderFallbackException objects provide the following information about the condition that caused the exception:. The EncoderFallbackException object includes an IsUnknownSurrogate method, which indicates whether the character or characters that cannot be encoded represent an unknown surrogate pair in which case, the method returns true or an unknown single character in which case, the method returns false.

The characters in the surrogate pair are available from the EncoderFallbackException. CharUnknownLow properties.

The unknown single character is available from the EncoderFallbackException. CharUnknown property. The EncoderFallbackException. Index property indicates the position in the string at which the first character that could not be encoded was found.

The DecoderFallbackException object includes a BytesUnknown property that returns an array of bytes that cannot be decoded. The DecoderFallbackException. Index property indicates the starting position of the unknown bytes. Although the EncoderFallbackException and DecoderFallbackException objects provide adequate diagnostic information about the exception, they do not provide access to the encoding or decoding buffer.

Therefore, they do not allow invalid data to be replaced or corrected within the encoding or decoding method. In addition to the best-fit mapping that is implemented internally by code pages,. NET includes the following classes for implementing a fallback strategy:. In addition, you can implement a custom solution that uses best-fit fallback, replacement fallback, or exception fallback, by following these steps:.

Derive a class from EncoderFallback for encoding operations, and from DecoderFallback for decoding operations. Derive a class from EncoderFallbackBuffer for encoding operations, and from DecoderFallbackBuffer for decoding operations. For exception fallback, if the predefined EncoderFallbackException and DecoderFallbackException classes do not meet your needs, derive a class from an exception object such as Exception or ArgumentException.

To implement a custom fallback solution, you must create a class that inherits from EncoderFallback for encoding operations, and from DecoderFallback for decoding operations.

Instances of these classes are passed to the Encoding. GetEncoding String, EncoderFallback, DecoderFallback method and serve as the intermediary between the encoding class and the fallback implementation. When you create a custom fallback solution for an encoder or decoder, you must implement the following members:. The EncoderFallback. MaxCharCount or DecoderFallback. MaxCharCount property, which returns the maximum possible number of characters that the best-fit, replacement, or exception fallback can return to replace a single character.

For a custom exception fallback, its value is zero. CreateFallbackBuffer or DecoderFallback. The method is called by the encoder when it encounters the first character that it is unable to successfully encode, or by the decoder when it encounters the first byte that it is unable to successfully decode. To implement a custom fallback solution, you must also create a class that inherits from EncoderFallbackBuffer for encoding operations, and from DecoderFallbackBuffer for decoding operations.

CreateFallbackBuffer method is called by the encoder when it encounters the first character that it is not able to encode, and the DecoderFallback. CreateFallbackBuffer method is called by the decoder when it encounters one or more bytes that it is not able to decode. Each instance represents a buffer that contains the fallback characters that will replace the character that cannot be encoded or the byte sequence that cannot be decoded.

The EncoderFallbackBuffer. Fallback or DecoderFallbackBuffer. Fallback method. Fallback is called by the encoder to provide the fallback buffer with information about the character that it cannot encode. Because the character to be encoded may be a surrogate pair, this method is overloaded.

One overload is passed the character to be encoded and its index in the string. The second overload is passed the high and low surrogate along with its index in the string. The DecoderFallbackBuffer. Fallback method is called by the decoder to provide the fallback buffer with information about the bytes that it cannot decode.

This method is passed an array of bytes that it cannot decode, along with the index of the first byte. The fallback method should return true if the fallback buffer can supply a best-fit or replacement character or characters; otherwise, it should return false.

For an exception fallback, the fallback method should throw an exception. GetNextChar method, which is called repeatedly by the encoder or decoder to get the next character from the fallback buffer. Remaining or DecoderFallbackBuffer. Remaining property, which returns the number of characters remaining in the fallback buffer. MovePrevious or DecoderFallbackBuffer. MovePrevious method, which moves the current position in the fallback buffer to the previous character. Reset or DecoderFallbackBuffer.

Reset method, which reinitializes the fallback buffer. If the fallback implementation is a best-fit fallback or a replacement fallback, the classes derived from EncoderFallbackBuffer and DecoderFallbackBuffer also maintain two private instance fields: the exact number of characters in the buffer; and the index of the next character in the buffer to return. The following example uses a custom best-fit fallback implementation instead to provide a better mapping of non-ASCII characters. To make this mapping available to the fallback buffer, the CustomMapper instance is passed as a parameter to the CustomMapperFallbackBuffer class constructor.

The dictionary that contains best-fit mappings and that is defined in the CustomMapper instance is available from its class constructor. Its Fallback method returns true if any of the Unicode characters that the ASCII encoder cannot encode are defined in the mapping dictionary; otherwise, it returns false.

For each fallback, the private count variable indicates the number of characters that remain to be returned, and the private index variable indicates the position in the string buffer, charsToReturn , of the next character to return. The following code then instantiates the CustomMapper object and passes an instance of it to the Encoding.

The output indicates that the best-fit fallback implementation successfully handles the three non-ASCII characters in the original string. Skip to main content. This browser is no longer supported. Download Microsoft Edge More info. Contents Exit focus mode. How to use character encoding classes in. Is this page helpful? Please rate your experience Yes No. Any additional feedback? Important The most common problems in encoding operations occur when a Unicode character cannot be mapped to a particular code page encoding.

This browser is no longer supported. Download Microsoft Edge More info. Contents Exit focus mode. Please rate your experience Yes No. Any additional feedback?

Namespace: System. IO Assemblies: mscorlib. Opens a text file, reads all the text in the file into a string, and then closes the file. ReadAllText String, Encoding. Opens a file, reads all text in the file with the specified encoding, and then closes the file.

Text for the encodings "using System. UTF8 and Encoding. The byte order mark BOM is a unicode character at start , which signals the encoding of the text stream file. IO; using System. WriteAllText filepath, filetext, Encoding. WriteAllText :.

We don't need to care about the encoding, because the function detects the encoding by reading the BOM Byte Order Mark. WriteLine File.



0コメント

  • 1000 / 1000