 |
 |
AutoCAD 2004 DWG: Not Encrypted, Honest!
By Martyn Day, editor, CADserver,
July 17, 2003
On April 1st, 2003, the OpenDWG Alliance publicly announced
that its users were having trouble reverse-engineering the new
AutoCAD file format. The reason for the problem, the group
claimed in its press release, is that Autodesk had used complex
compression, together with data encryption techniques: "the
inclusion of data encryption and compression schemes within the
new file format has created serious challenges to DWG data
interoperability."
This is a serious accusation. What’s most significant is not
the mention of file format compression (most file formats are
compressed) but the allegation that the DWG file is actually
"encrypted." All sorts of negative and nefarious connotations
could arise from the suggestion that Autodesk has adopted
encryption; namely, that the company is trying to lock in its
users and keep out the competition, since opening a file would
require every user to possess a valid copy of Autodesk software.
OpenDWG's claims
For those not up on the wonderful world of data
interoperability, the OpenDWG Alliance was born out of the need
of Autodesk's competitors to reverse-engineer AutoCAD's DWG file
format. To become a member, you have to tell the OpenDWG
Alliance all you know about the DWG format, pay a yearly fee and
in return receive the DWG libraries that its programmers
generate. The OpenDWG's goal is to make DWG an open standard
"usable by all programs that need to access valuable DWG data."
Needless to say, with proprietary file formats being seen as a
competitive advantage, Autodesk isn't a member (the de facto
standard, DWG files are estimated to number some 3 billion). In
my talks with Autodeskers though, they wryly point out that they
would join the Alliance if they were to receive libraries of all
the other member's proprietary CAD products in return (those by
Bentley, EDS, PTC, Graphisoft, Nemetschek, ESRI. and CADKEY to
name a few).
After reading the Alliance's release, I spoke with Evan
Yares, the group's president and executive director. Yares was
convinced that Autodesk had chosen to encrypt the DWG file,
despite assurances from Autodesk representatives that they had
not. He sent me a DWG file with a corresponding Hex dump, with
an explanation of how this demonstrated Autodesk's encryption.
Unfortunately, not being a programmer, the evidence was somewhat
hard for me to digest.
In an issue of the upFront.eZine newsletter, the OpenDWG
Alliance made specific technical claims concerning the
compression algorithm: "Rather than having a single compression
type, each object type appears to have its own individual
algorithm, with a large number of special cases. Object
compression is controlled by a 32-bit flag, which provides for
billions of possible permutations. OpenDWG has
reverse-engineered the compression algorithms for some objects,
but substantial work remains to be done." And on the encryption
issue, the group stated: "Both the file and section headers are
encrypted, but in different manners from each other. While
OpenDWG has been able to determine the algorithm used for both,
it has not been able to determine if the encryption keys used to
scramble the data will remain static, will change in each point
release of AutoCAD, or will ultimately be changed dynamically
under program control." (See
DWG 2004 - Tougher Than Thought.)
Surely, faced with such a specific and detailed accusation,
Autodesk would respond? But the company appeared to hold back.
Carl Bass, executive vice president of Autodesk's Design
Solutions Division, told me that the OpenDWG Alliance claims
were "total nonsense and hysteria." It seemed that Autodesk was
going to wait it out; in time, Bass expected, the Alliance would
realize there was no such encryption and make a public
announcement or apology to that effect.
The plot thickens
Meanwhile, Montreal-based viewing and mark-up developer
Cimmetry announced that it had support for the AutoCAD 2004 file
format in its forthcoming update to AutoVue 17 - the same DWG
format that the OpenDWG Alliance claimed was encrypted. Cimmetry
had reverse- engineered the file format without any technology
input from Autodesk. While this could not be taken as a sign
that 2004 DWG was not encrypted (Cimmetry has reverse-engineered
both non-encrypted and encrypted file formats in the past), it
was a sign that the format was at least decipherable by an
outside organization. In his CADwire.net commentary, Evan Yares,
this time in his role as industry analyst, chose to doubt the
validity of Cimmetry's claim: "Either Cimmetry told the truth in
their press release, or they lied. I've seen no evidence that
Cimmetry told the truth." I assumed that this comment reflected
his role within the OpenDWG and the extra pressure that group
must have felt on hearing that someone else had cracked 2004
DWG.
I managed to get a copy of the updated 2004 DWG AutoVue to
test it for myself and then showed Yares that Cimmetry had
indeed cracked the 2004 format. Yares then honorably posted a
retraction: "Cimmetry can fairly claim bragging rights to having
delivered the first third-party viewer with AutoCAD 2004
support. I definitely owe them this recognition, and my
congratulations." It is true to say though that
reverse-engineering to read a file is only halfway there, the
OpenDWG would have to be able to write 2004 DWGs as well.
All this time, Autodesk had remained silent. There were times
past when I would have expected Autodesk to have fired off a few
lawyers’ letters, or at least produced a qualified rebuttal.
When I asked to interview Carl Bass on the subject of
encryption, he agreed, and we were joined by John Sanders, executive VP of Design Solutions Division, and Mark Strassman,
director of marketing for AutoCAD.
Autodesk responds
I first asked for Autodesk's reaction to OpenDWG's claim that
2004 DWG was encrypted. Bass replied, "The whole thing is
actually pretty complex and I think people like Evan (Yares)
have muddled the facts. There is an element of truth in many of
the things they say but the gist of their argument is not
correct. We changed the DWG file format for the customer's
benefit. A lot of this benefit was around network performance
and compression. We made no secret of the fact that we were
compressing files and not very differently from something like a
Zip file. We didn't use the same algorithm as Zip, but we did
use a relatively standard and well-known compression method. As
is often the case in computer problems, there is a trade-off
between size and speed. Because of the interactive nature of a
program like AutoCAD vs. zipping/unzipping files for email or
archiving, we selected an algorithm that was optimized for
performance."
Strassman added, "The smaller file size is one of the big
features of 2004. One of the things Evan said is that we use
different compression types for individual features and stuff
like that, which is just false. We use a standard compression
library throughout the DWG file. It's standard compression,
there's no encryption. Compression just makes the file smaller."
Bass said that files were growing larger and larger and
sharing them was beginning to overtax networks. "Now, if users
only have to move 3 to 5 megabytes instead of 10, that's
obviously better for everyone on the network. Just as we do when
we send people big files, we often compress them, that's why
people send JPEGs around instead of TIFFs and another reason why
we came up with DWF. It's all about moving the information
around more efficiently. We surveyed the customers and a vast
majority run AutoCAD, or AutoCAD-based products, on a network,
and this drastically improved the Open, Close and Save
performance across a network. Compression is about better
performance over networks - pure and simple. But if what we were
doing was only that, you could decompress it at the other end
and look at a file that was a 2002 file. We hadn't changed the
file format in a while, because to do that means you have
re-architect it. On the first release you do great. The second
one gets a little bit messy, the third one gets pretty crusty
and then you need to clean it up. And it's worth knowing where
you are going for the next several releases, when you don't want
to change file format. That's always been a barrier for users.
So it makes sense to put in place an architecture that allows
for that kind of extensibility."
John Sanders added, "Hopefully, we have a foundation for the
next couple of releases so we won't need to change the format."
While that sounds a reasonable explanation, Autodesk changes
its file format, on average, every 3 releases, which doesn't
seem to be very forward looking. I asked Bass why this was the
case when competitors such as Bentley managed to keep the same
format for 15 years.
"If you go into Microsoft Word and you pull down the Save As
menu, there are about 7 different file formats to choose from.
You change formats because you have to add functionality. I
really don't know enough to comment on Bentley."
Strassman explained, "A lot of the changes in the past have
been considered minor because we were just stuffing more
information into the DWG file format. This time, we wanted to
make it so the file would really have a future, so we could
eventually add new features to AutoCAD without changing the file
format."
But wasn't that what the Object ARX programming language was
developed for? "That's what ARX did from a code standpoint, but
not from a data standpoint," Bass replied. "We've just made a
much more flexible way to add data for us and for our
third-party developers. They have a better mechanism to get at
the data. So that's what that was all about, it's all geared
toward customer benefit. That's why we changed the file."
Strassman expanded on this point: "We also did a bunch of
other things ato make it easier to recover files, to make
it easier to see when a file is corrupted. We added all sorts of
things for the functionality of the user, which will hopefully
allow us to avoid changing the file format significantly in the
future."
"Mark brought up an important point," added Bass. "People
have always wanted a reliable 'recover' command. AutoCAD always
done a reasonably good job but it has never been 100% perfect in
that area. But data recovery is a huge user request, especially
because people get corrupt DWGs written by third-party DWG
libraries - which is ironic in the context of this conversation.
The problem is compounded when an architect, for example,
creates a file, sends it to a consulting engineer and they
apparently corrupt the file and then they want to be able to
recover it and get the information back. So we have put in more
mechanisms to make sure we could actually provide a recover in
more circumstances.
Encryption vs. encoding
On broaching the encryption issue, I reiterated the OpenDWG's specific arguments concerning their findings, namely that
the file headers are scrambled with a 128-byte magic number.
This statement incited rebukes all around from my interviewees.
Bass and Saunders both chimed in, "Wrong, wrong, wrong!",
while Strassman added, "There is no magic number."
"They will find out later," said Bass. "That's why I haven't
been particularly interested in responding because the
competitors just look incompetent."
What about the other OpenDWG allegation - that the
sub-headers in the file are encrypted with a 4-bit key? "There
is no encrypting," Bass replied. "None. There are no keys, they
are wrong. I'm more than happy to make a statement about it not
being encrypted, except for the password protection. We actually
believe that customers should have total control over their
data. As an example, we had a choice when it came to putting
encryption for the password protection in the file, whether or
not to have a back door to access the data and we decided not
to. There is nothing special that we can do to that file that a
user can't do. It's totally down to user control. We believe
fundamentally that users have a right to control their data."
The ownership of data is an interesting point. Should users
only have access to their data though an Autodesk product? Bass
replied, "If they created it in a DWG file format that's not
from an Autodesk product, I don't think we are that involved in
this question, right? If they created the DWG in MicroStation,
we would have no real involvement in that conversation, they
just happen to have chosen our format but we have no obligation,
one way or another. The relationship is entirely between that
user and Bentley."
If a non-Autodesk customer was given a DWG file to work with,
does Autodesk believe that person should buy a copy of its CAD
software to open it? "No, it's up to them to look along the axis
of price, features and fidelity," answered Bass. "I think that
person is someone we don't have an obligation to. Are we
interested in them becoming a customer? - that's a different
question than whether we have an obligation to provide them with
free software or with certain functionality." Bass added that he
was actively considering allowing an independent third-party to
reverse- engineer the DWG file format to adjudicate on the issue.
He said he wouldn't pay for it to be done but the doubters
could. To offer DWG up for independent analysis would be a
foolish thing to do if 2004 were indeed encrypted. I took this
alone to be a sign that Bass was willing to put his head on the
block to state that AutoCAD 2004 DWG was not encrypted.
So, on the question of encryption, an emphatic "No" from
Autodesk - along with some bruising comments on the capabilities
of the OpenDWG Alliance. Speaking of which, I've seen
correspondence between Autodesk and the OpenDWG, in which the
former admitted there was some "simple encoding" in the new DWG.
I asked Mark Strassman what this meant.
"There is a big difference between encoding and encryption,"
he said. "Encoding, when used in a software development context,
typically means translating some concept to a digital form for
use by the computer. For example, ASCII is an encoding scheme
for the English alphabet and punctuation. In ASCII, the letter
'A' is encoded as the value 65, or 1000001 binary. In fact,
ASCII stands for ‘American Standard Code for Information
Interchange.’ Thus, letters placed into this "code" are encoded
in ASCII form. Similarly, in AutoCAD we have to translate things
like geometry and attributes into a digital code to be
interpreted by the computer and stored on the hard drive. DXF is
one form of encoding. DWG is another. So, the concept of a red
line from 0,0 to 1,1 would be encoded as some series of binary
numbers in the DWG. This is the manner in which we used
‘encoding’ in our original email with Evan's team.
"Encoding is a word in common usage in software engineering.
Unlike encryption, encoding does not imply any attempt to hide
or obfuscate information. Because laypersons occasionally
confuse the usage of ’encoding’ and ‘encryption’ we have stopped
using the term encoding when referring to the DWG. The only
encryption in the AutoCAD 2004 DWG is file password protection,
which is totally under the control of the user and is there to
allow for secure transmission of drawings solely at the user's
discretion."
OpenDWG support
On May 16th, the OpenDWG Alliance announced official support
for the 2004 variant of the AutoCAD DWG file format. In the
press release Evan Yares stated that, "Although we had no
significant concerns about being able to implement support for
the AutoCAD 2004 DWG file format, there were enough variables
that the task was not trivial." This statement, to me, is a bit
rich since those "insignificant concerns" had generated an
attacking press release only the month previous. The release went
on to claim that, "In AutoCAD 2004 DWG, a comprehensive
compression algorithm is applied to almost all data structures,
and the file and section headers are encrypted using a
magic-number/XOR algorithm." A cautionary note to users added
that, "Despite the fact that we support the format, users should
continue to be cautious about using AutoCAD 2004 DWG files for
projects which require long-term data access as the format does
contain encryption."
Again, a series of specific claims, although the Alliance
seems to have dropped an earlier allegation that there were
billions of magic number permutations and that the encryption
could be changed on the fly without introducing new builds of
AutoCAD.
Conclusion
The AutoCAD 2004 DWG encryption debate is a very complex one
to follow, as relates to both an understanding of the technology
itself and the semantics of the arguments. I have to believe
Carl Bass when he says that Autodesk has not encrypted the file
- not in the classic definition of running an algorithm over the
DWG to hide its contents from all outsiders. If that had been
the case then it was a failure because Cimmetry announced
support within a month of AutoCAD 2004's shipping and it only
took the OpenDWG Alliance a month beyond that. Besides, Autodesk
is savvy enough to foresee the kind of bad press that would
result from doing something as outrageous as encrypting the
basic DWG file and, in effect, trapping their customers.
That said, Autodesk has done a major amount of work to the
AutoCAD file format and data compression is, in some way, data
obfuscation, where data is reduced in size via a formula
(algorithm) and reconstituted on loading. As compression is
standard across the industry, one could hardly point a wagging
finger at Autodesk. I have it on good authority, however, that
the compression used in AutoCAD 2004 is very complex and doesn't
appear to give DWG any greater ratio of compression than PKZIP
provides. So why adopt it? I haven't had a sufficient
explanation from Autodesk.
Although Autodesk obviously hasn't overtly encrypted the 2004
DWG, certainly it isn't in the company’s interest to make the
reverse-engineering process any easier for its competitors.
Autodesk has made code changes to AutoCAD LT in the past to
"dissuade" application developers from coming up with
applications for AutoCAD LT. It may well be that this was taken
into consideration when the new DWG was being devised; with all
the changes, improvements, and semantics it's difficult to see
the big picture.
If anything, Autodesk should be more worried about why
people were so willing to believe that it had overtly
encrypted the file format. Indeed, on my travels it appears that
the general perception is that Autodesk has used encryption; "It
sounds like something Autodesk would do” being a typical
response.
In light of such widespread negativity, perhaps Autodesk should
respond with more openness to its customers and to the industry
at large. Autodesk's negativity towards Adobe's PDF format makes
one think that format definition and control of those formats
really does matter to Autodesk.
One of the biggest issues in the CAD world is
interoperability, the battle between proprietary systems and
open formats. Nearly all CAD systems are in some way proprietary
because they are devised and controlled by the company that
originated them. As a customer of these software firms you own
the information that is stored in their "wrapper'" (file format)
but do not always have an independent way of gaining access to
that information. The OpenDWG Alliance claims it is acting for
the good of open systems. But it's worth noting that the group’s
existence is funded by competitors to AutoCAD and its DWG
format. I think this is an awkward position to defend. As for
Autodesk, it tends to rest its "open format" laurels on DXF,
developed in the 1980s to solve the company’s own problems of
transferring DWGs between incompatible operating systems.
There are no real saints here, on either side of the divide.
About the Author
Martyn Day is group editor of MCAD Magazine and AEC
Magazine. For more information, visit the
CADserver website. Related Articles
|