Pages

Introduction to DICOM - Chapter 6 - Transfer Syntax

Transfer syntax defines how DICOM objects are serialized. When holding an object in memory, the only thing that matter is that your application can use it. The internal representation of the object is your own business. However, when sharing objects with other applications, everyone should be able to use the same object. The common solution for such problems is serialization.

Serialization is the process of writing a data structure or object state to wire i.e in a format that can be stored in a file or memory buffer, or transmitted across a network so it can be red on the other side of the wire or later by the same or by another process.

There's no shared memory in DICOM but it can be easily made using the same mechanism that is utilized for networking and files alike i.e. serializing the object into memory according to the rules dictated by the standard i.e. using transfer syntax.

In this post I'll cover the following issues:
  • Present the term Transfer Syntax, 
  • Why Transfer Syntax is required 
  • What is Transfer Syntax used for 
  • How Transfer Syntax is set when using 
    • DICOM files 
    • DICOM network
So, as I said, the serialization in DICOM is governed by a term called Transfer Syntax.

Transfer Syntax is defined at the object level and is the syntax for serializing a DICOM object. We have seen transfer syntaxes already in chapter 5 when dealing with association negotiation but did not discuss them. In order for an application to read a DICOM object from a network wire, it has to know the rules that were used to write the object into the wire. In the association request the calling AE sends a list of abstract syntaxes with SOP Class UID's. For every SOP Class, the calling AE sends a list of transfer syntax UID's. In the association response the called AE selects one of the transfer syntax UID's for every SOP class it accepts.


Here's a short snippet from the last post on DICOM networking:

Presentation Contexts:
  Context ID:        1 (Proposed)
    Abstract Syntax: =VerificationSOPClass
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =BigEndianExplicit
      =LittleEndianImplicit

This is part of the association request and in red you see the three trasnfer syntaxes that the calling application is suggesting for the first presentation context. It suggests the three basic transfer syntaxes:
  • Little Endian Explicit which is defined be the UID: 1.2.840.10008.1.2.1
  • Big Endian Explicit which is defined be the UID: 1.2.840.10008.1.2.2 and 
  • Little Endian Implicit which is defined by the UID: 1.2.840.10008.1.2
The Transfer Syntax UID is a UID that identify the transfer syntax (that's lame, ha?). Like all the other UID's it can be found in chapter 6 of the standard.

Transfer syntax sets exactly three things that are required in order to parse the serialized DICOM object:
  1. If VR's are explicit, i.e. if the data type code of every element should be serialized or it will be implicitly deduced from the element tag (see the post on DICOM Elements
  2. The order that bytes of multi-byte data types are serialized. For example, if we have an element with unsigned short data type (the Value Representation, VR, is US) than which byte of the two is the first byte written to the buffer and which is the second. 
  3. If pixel data is compressed and what compression algorithm is used. Compressed pixel data transfer syntax are always explicit VR little Endian (so you can call JPEG baseline 1.2.840.10008.1.2.4.50 for example "explicit little endian jpeg baseline") .
Most DICOM toolkits, and RZDCX is not different, handle the first two items on this list for you and automatically change the byte order and inserts or removed the VR codes for you. But there are cases when this multitude of choises (and don't ask why do we need three serialization syntaxes, instead read the second post in this series) causes problems.

I'm going to leave compression for a later chapter but in short, DICOM defines many compressed transfer syntaxes, that are simply the compressed image stream encapsulated into the pixel data element of the DICOM object so one can actually open a DICOM file with a binary editor, locate the pixel data element (7FE0,0010), cut out the value, save it as a jpeg file, double click it and see it. Maybe we'll do it together when talking about compression.

Let's now do an example that shows some issues that you may be confusing. Let's say we have two images, both are CT but one we have compressed with the jpeg lossless compression and we would like to send it to an archive.

This negotiation is rather strange because one can for example negotiate two abstract syntaxes (1 and 3, remember?) in the following way:

1) CT Image storage, explicit little endian
3) CT Image storage, jpeg lossless
In this example the calling application requests to send a CT image and a compressed CT image.
The request could have been composed this way as well:
1) CT Image storage, (explicit little endian, jpeg lossless)
But this is different because the called AE will select one of the suggested transfer syntaxes and the calling AE will have to send all CT images according to the selected transfer syntax either encoding them all before sending or decoding them all, depending on what transfer syntax the called AE have selected. If your application can't do this compression on the fly, you may get calls from the field. Using the first negotiation however, the called AE will most likely accept both 1 and 3 and we can send the uncompressed images using context id 1 and the jpeg compressed DICOM images using context id 3. You don't mind that I say that RZDCX takes care of all this for you.

Most issues with transfer syntax are related to applications that don't support transfer syntaxes that others require. If you have one application that can read only big endian and another that is limited to little endian, they will never talk to one another. That's radical but there are many applications that don't support any compressed images or can only store them but not display them and if your application generates only jpeg's so you better rethink the design.
Transfer syntax issues sometimes cause images to look bad. I've seen applications that change the big-little endian (i.e. the byte order) of the pixel data without changing the transfer syntax properly causing the images to be unreadable. Such images usually feature jagged edges and bad contrast. I've also seen a very popular CD burner that if interfaced with implicit syntax causes many elements to become of Unknown VR even if these are well defined elements.

Now let's move to DICOM files. Just like in DICOM networking, DICOM files must be red by all applications so thats a serialization too. Here are the rules for DICOM files:

When writing a DICOM object to file, the application that creates the file writes it in the following way:
  1. The first 128 bytes are null (0x00) 
  2. Bytes 128 - 131 (zero based) are 'DICM' which is the DICOM magic number 
  3. Add to the object a file meta header - a group of elements of group 0002 that are the first elements in the object. 
  4. Group 0002 is written in Little Endian Explicit 
  5. Element (0002,0010) is the Transfer Syntax UID that is used for all the elements other than group 2.
So, when reading a DICOM file, a DICOM library should do this:

  1. Read 132 bytes (these 132 bytes are called the preamble) and see that 128-131 equal "DICM" 
  2. Start parsing using Little Endian Explicit all the group 0002 elements 
  3. Check the value of element (0002,0010) and use this transfer syntax for the rest of the file.
Important: always remove group 0002 before sending objects over the network. Group 0002 is strictly for DICOM files.


Before summarizing, here's a very detailed explanation of how a DICOM file actually looks like in the byte level. The screenshot bellow shows three DICOM files of exactly the same object opened in a binary editor. Each file was saved with a different transfer syntax using this simple code:


      string BEEfname(filename);
      BEEfname+=".bee.dcm";
      obj->TransferSyntax = TS_BEE;
      obj->saveFile(BEEfname.c_str());


      string LEEfname(filename);
      obj->TransferSyntax = TS_LEE;
      LEEfname+=".lee.dcm";
      obj->saveFile(LEEfname.c_str());

      string LEIfname(filename);
      obj->TransferSyntax = TS_LEI;
      LEIfname+=".lei.dcm";
      obj->saveFile(LEIfname.c_str());



Up to the highlighted part, the files are identical but for the value of the transfer syntax UID element in the file meta header. You can see the 128 0's and the DICM and then the elements of group 0002. In all three files this part is little endian explicit and you can see the VR codes UL and then OB just after the preamble.

The highlighted part is the first data element of the object itself, which is element (0008,0005). While in the Little Endian files (left and right) the bytes are ordered 08 00 05 00, in the big endian file (center) the order is 00 08 00 05.


Then, in the explicit VR files (left and center) the tag is followed by 'CS' which is the VR code of this element. CS stands for Code String and tells us the data type of this element. In the implict VR file this code is missing. It is implicitly specified by the tag. Tags always have the same type (this tag is called extended character set and it is the code string of the character set encoding for the strings in the file).

After that we have the data element length which is 0xA (meaning the value is 10 bytes long) and then the value itself 'ISO_IR 192' which means that the strings in this DICOM files are encoded using UTF-8. Note that in the explicit VR files the length is stored in a two bytes while in the implicit VR file the length is stored in four bytes (quiz: though not very important, can you guess why?).

Let's summarize:
  1. Serialization of DICOM objects is governed by Transfer Syntax 
  2. Transfer syntax sets: 
    • The byte order (little/big) 
    • If VR's are serialized (explicit/implicit) 
    • If pixel data is compressed or not (if compressed 1 is little and 2 is explicit) 
  3. In DICOM networking, the transfer syntax is selected per object type (SOP Class) at the negotiation phase 
  4. In DICOM files the transfer syntax is set in the File Meta Header (group 0002)
Recommendations as far as transfer syntax goes:
  1. Always support and propose all 3 basic transfer syntaxes: LEI, LEE and BEE
  2. If possible, always prefer LEE as your default.
With RZDCX you are dismissed from bothering about all these details. The transformations between transfer syntaxes are taken care of internally as well as the selection of transfer syntaxes during association negotiation and when reading and writing files. The toolkit takes care of all that. You can control it when saving files as shown in the detailed example above and also compress and decompress but unless there's a very specific requirement about that in your application, you will probably never have to deal with it.

One last comment. Many times I'm asked what transfer syntax is used by some application internally, i.e. when some application, e.g. a PACS writes files in it's internal storage, how they are stored. My answer to that is that I don't know and that you shouldn't care. Never assume anything about the internals of an application. The only thing that matters is their interfaces.

As always, comments and questions are most welcome

3 comments:

  1. Why recommended to propose or support BEE? Almost all systems accept LEE and LEI is required to be supported.
    It only makes testing more difficult.

    ReplyDelete
  2. Hello there,
    Nice series of posts about DICOM. Those are a "what every DICOM developper should know about DICOM", I like it.
    @Victor : I guess this recommandation goes in the way of backward compatibility: there are still a lot of old platforms in the field that are natively big endian speakers, as you may know.

    ReplyDelete
    Replies
    1. Thanks. Supporting all three LEI, LEE, BEE costs nothing. On the other hand omitting any of them may result with failures to communicate with some systems so why not support it?

      Delete