Saturday, September 27, 2014

Video to DICOM (and back)

This is a story of a lost battle. For many years I refused to add MPEG to DICOM functionality in my DICOM SDK. The explanation I gave to myself and to my customers was that storing video in PACS is a bad idea because video streams are usually very big and nobody ever watches them. From an engineering point of view, the size of the video is not so much a matter of disk space but rather a network headache. The way that the DICOM network protocol works, with all the different levels of timeouts and with no failover mechanisms for PDU’s may cause such huge objects to fail over and over when stored and restored. For the clinical point of view, I consulted with Radiologists friends from whom I learned that the driving force behind keeping most of this stuff is not clinical but rather medico legal. These excuses held for some time but eventually, because I’m an engineer but also a businessman, I changed my mind. After all, the customer is always right, and when more and more customers asked to convert video to DICOM, I realized that winning this battle means loosing customers and that’s not something a businessman should do.

Videos were added to DICOM through the mechanism of Transfer Syntax. All together there are currently four (4) video transfer syntaxes for different types of MPEG’s. Here's the list of these transfer syntaxes:
  • MPEG2 Main Profile @ Main Level : "1.2.840.10008.1.2.4.100"
  • MPEG2 Main Profile @ High Level : "1.2.840.10008.1.2.4.101"
  • MPEG-4 AVC/H.264 High Profile / Level 4.1 : "1.2.840.10008.1.2.4.102"
  • MPEG-4 AVC/H.264 BD-compatible High Profile / Level 4.1 : "1.2.840.10008.1.2.4.103"


If I have to guess, there will probably be more added in the future as new formats of video gain take over. The embedded document option that was taken for PDF would probably be my choice but I admit that I didn’t investigate the reasons that led to the way the standard went and there may have
been good reasons to do it the way it was done. Anyway, it’s done now so we better learn how to use it and don’t ask too much why. Engineering is not mathematics, there’s more then one correct answer. The support for video evolved in the DICOM standard through a series of supplements (sup42, sup137 and sup149). The DICOM Standard limits the choice of encodings to MPEG2 and MPEG4 and further limits the allowed resolutions, frame rates and other video parameters. Reading this section of the standard reminded me a lot part 11 of the standard that details media interchange and media profiles. Eventually, implementers just stick to the very basics of these sections.

According to the standard, the video properties should be set in the relevant DICOM elements of the image group (0x0028) like rows, columns, number of frames (calculated from length in milliseconds multiplied by the frame rate) and frame duration (1000/FPS) and so on. What I find as most important is that the standard tells implementers that the encapsulated video properties take over when there’s a conflict between the video properties that are part of the encapsulated video stream data and the properties stated in the DICOM elements. I think this was a wise decision. It allows implementers to just take out the video stream out of the DICOM envelope and to pass it to their video player or the OS to deal with. If it’s a valid video stream and you have the encdec for it, then it will play and if not then the parameters in the DICOM elements are not going to change much about that. The only good that I can see of having this data stated in the DICOM is to provide some heads up of what’s inside.

If we look into this from implementation point of view, what it actually means is that when coming to display or otherwise use the video stream, we should disregard the DICOM attributes and simply read the stream and play it. The implementation than becomes quite simple and a flow like the following can work:
1. If the Transfer Syntax is one of the four transfer syntaxes DICOM defined for video stream:
a. Get the pixel data,
b. Take the first fragment from encapsulated pixel data sequence,
c. Fid this data into your Video SDK.

For the ones that create the videos and want to put it into a DICOM object, If you want to be friendly (who doesn’t) then you should encode your video according to the specifications of the standard. But what if you don’t? Than what? Good question! Lets look at DICOM encapsulated JPEG’s. What most applications do is exactly the same: take the JPEG stream out of the encapsulated fragment and put it into your programming platform’s Image object. One thing that can be said on implementations that don’t follow these simple steps is that they’re probably much slower to display the image and probably less compliant to different JPEG flavors. Let’s face it: Microsoft, Apple and the Open Source community engineers invested much more in their JPEG implementation then all Medical Device R&D teams together. DICOM in this case is not much more then an envelope, a binder. It attaches together Patient, Procedure and image objects. Its very similar in a way to EXIF that adds GPS coordinates, camera model and capture parameters like focal length, aperture and so on, to an image taken by a digital camera or a smartphone.

Lets take a break from all this contextual discussion, switch to the context free world and see how we implemented this. Here are two code snippets for converting MPEG to DICOM using RZDCX.

Here's sample C++ code
/// Create a DCXOBJ
IDCXOBJPtr obj(__uuidof(DCXOBJ));
/// Provide the properties of the video stream
rzdcxLib::ENCAPSULATED_VIDEO_PROPS videoProps;
videoProps.width = 352;
videoProps.Height = 288;
videoProps.PixelAspectRatioX = 4;
videoProps.PixelAspectRatioY = 3;
videoProps.FrameDurationMiliSec; // 40 msec = 25 FPS
videoProps.NumberOfFrames = 1600; // 1600 frames
videoProps.VideoFormat = rzdcxLib::MPEG2_AT_MAIN_LEVEL;

obj->SetVideoStream(filename.c_str(), videoProps);

And that's an equivalent C# code snippet
        [Test]
        public void Test_ConvertMpegToDICOM()
        {
            DCXOBJ obj = new DCXOBJ();
            ENCAPSULATED_VIDEO_PROPS props;
            props.FrameDurationMiliSec = 40;
            props.Height = 360;
            props.width = 640;
            props.NumberOfFrames = 1600;
            props.PixelAspectRatioX = 1;
            props.PixelAspectRatioY = 1;
            props.VideoFormat = VIDEO_FORMAT.MPEG4;
            obj.SetVideoStream(@"test.m4v",props);
            obj.Dump(@"test.m4v.txt");
            obj.saveFile(@"test.m4v.dcm");

        }


One thing that you probably noticed is that we do not examine the video stream data that you provide. We totally relay on you for that. Resolution, encoding, length, frame rate, you should provide it all. Though in JPEG RZDCX does the encoding and decoding for you (it probably shouldn’t), when it comes to video streams, we realized that trying to parse the video streams and comply with their formats may be an endless task. RZDCX is a DICOM toolkit and it should not carry all the weight of video encoding on its back. We assume that our customers that use this feature know what their video properties are and already have the code for creating, writing and reading it so having another video encdec as part of the DICOM SDK is simply overweight. As argued earlier, there are many video encdecs out there for all platforms and programming languages.

What about sound? DICOM recommends that applications should be able to play the sound track that may be part of the embedded video.
 
Getting the video form the DICOM object is just like getting encapsulated JPEG frames. The Video should always be stored in one fragment so that’s quite simple as in the following code:

//Decodeexample
            DCXOBJ o2 = new DCXOBJ();
            o2.openFile(@"test.m4v.dcm");
            int len = o2.GetEncapsulatedFrameLength(0);
            byte[] data = new byte[len];
            fixed (byte* p = data)
            {
                UIntPtr p1 = (UIntPtr)p;
                o2.GetEncapsulatedFrameData(0, (uint)p1, len);
            }
            FileStream fs = new FileStream(@"test.extracted.m4v",
                FileMode.Create, FileAccess.Write);
            fs.Write(data,0,len);
            fs.Close();
            fs.Dispose();
Last very important note has to be made about the video length. DICOM requires pair element length so if your original video had an odd number length RZDCX automatically adds a 0x00 byte at the end.
Here is our test for encoding and decoding that takes this into account:
       [Test]
        public unsafe void Test_ConvertMpegToDICOM()
        {

            // Encode
            DCXOBJ o1 = new DCXOBJ();
            ENCAPSULATED_VIDEO_PROPS props;
            props.FrameDurationMiliSec = 40;
            props.Height = 360;
            props.width = 640;
            props.NumberOfFrames = 1600;
            props.PixelAspectRatioX = 1;
            props.PixelAspectRatioY = 1;
            props.VideoFormat = VIDEO_FORMAT.MPEG4;
            o1.SetVideoStream(@"test.m4v",props);
            o1.Dump(@"test.m4v.txt");
            o1.saveFile(@"test.m4v.dcm");

            //Decodeexample
            DCXOBJ o2 = new DCXOBJ();
            o2.openFile(@"test.m4v.dcm");
            int len = o2.GetEncapsulatedFrameLength(0);
            byte[] data = new byte[len];
            fixed (byte* p = data)
            {
                UIntPtr p1 = (UIntPtr)p;
                o2.GetEncapsulatedFrameData(0, (uint)p1, len);
            }
            FileStream fs = new FileStream(@"test.extracted.m4v"
                FileMode.Create, FileAccess.Write);
            fs.Write(data,0,len);
            fs.Close();
            fs.Dispose();

            byte[] original = File.ReadAllBytes(@"test.m4v");
            if (original.Length%2 == 1)
            {
                byte[] copyOfOrig = new byte[original.Length+1];
                original.CopyTo(copyOfOrig,0);
                copyOfOrig[original.Length] = 0x00;
                Assert.That(copyOfOrig.SequenceEqual(data));
            }
            else
                Assert.That(original.SequenceEqual(data));
        }

To summarize:
  1. You can recognize a DICOM file caring video stream by checking the transfer syntax in the DICOM file meta header
  2. The video stream is located in the first fragment of pixel data encapsulated sequence and should be compliant with the MPEG2 or MPEG4 standard
  3. When converting video to DICOM make sure the encoding is one that is supported by DICOM and if not transcode it to one that is supported
One last note. There are implementations of video to DICOM converters that doesn't save the video stream according to the standard. Instead they save the video stream in a native OB pixel data element (without putting it into a sequence). In this case you should just take the entire value of the pixel data and save it as an MPEG stream.

8 comments:

  1. Dear Roni,
    what is the maximum file size that can be dicomized? I noticed that dicomizing a video of 0.98 GB works fine, but when dicomizing a video of 1.35 GB or bigger, I get an "External exception E06D7363" upon calling obj.SetVideoStream(filename,props);

    ReplyDelete
    Replies
    1. Hi,
      In 2.0.4.2 the entire video stream is loaded into memory so if it's bigger than your available memory that may be a problem.
      But on the next release we're going to eliminate this limitation so theoretically you will be limited only by your disk space.
      However, creating big DICOM files is not recommended.
      My opining is that having files larger than 50-100 MB is not a good idea and recommend splitting the video into shorter chances and converting each one as a separate instance of the same series.

      Delete
    2. Hi again.
      do you already have a schedule when the next version will be released?
      100 MB of a Full HD MPEG video are only about 30 seconds recording time. For 5 minutes of video the file size is 1 GB.
      I think it'a acceptable to split long recordings into 1 GB fragments, but I'm not sure it's okay for the customers to split videos into even smaller fragments. Do you have any recommendations?

      Delete
    3. Dear Anonymous,
      Can you please contact me directly by mail regarding the Large MPEG's
      Thanks
      Roni

      Delete
  2. What is done about legal stuff? The video could be from any patient..., Only the DICOM wrapper identifies the video. I understand that the videos and pictures do not have the identifying features of DICOM. I would also assume the jpeg is the same way. Am I wrong?

    ReplyDelete
  3. Hi folks,
    It's 2018 and it looks like MPEG-2, MPEG-4 (H.264) and H.265 are supported.
    How popular are these codecs relative to each other in hospitals and for what reasons? Am I right in assuming that MPEG-4 is the most widespread due to the good quality / size ratio and compatibility with typical players (while H.265 is still new)? Or are customers traditional and prefer MPEG-2? What's your experience?
    Thanks a lot!
    Barry

    ReplyDelete
  4. Hi Roni, I'm using the example VideoToDICOM.cs that you supply, based on the code in this blog. I'm using it to extract MPEG2 data from a dicom file. The code extracts the video and saves an MPEG2 file which can be played, but the time associated with the file is nonsense. It is a 3 second video, and will only play for 3 seconds but at the end the time says "00:00:03/22:208.252". This seems to cause problems when I try to read the MPEG-2 into another program for analysis (MATLAB). Any insight into why this might be? Thanks.

    ReplyDelete