- Doc (computing) | Informatika & Komputer | |
Teknik Informatika. Sebelumnya Microsoft OneNote. Microsoft Outlook Berikutnya. Microsoft Open Specification Promise The Microsoft Open Specification Promise or OSP , is a promise by Microsoft , published in September , to not assert legal rights over certain Microsoft patents on implementations of an included list of technologies. Maksud dan Tujuan. Penerimaan Mhs Baru. Permintaan Beasiswa. Jaringan Web Program Reguler. Jaringan Web Kuliah Karyawan. Jaringan Web Perkuliahan Paralel.
Tabel Situs Kuliah Pengusaha. Kumpulan Situs Ensiklopedia. Jaringan Portal Gilland Group. Memasukkan benih ke wadah germinasi. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in these materials.
Except as expressly provided in the Microsoft Open Specification Promise and this notice, the furnishing of these materials does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The information contained in this document represents the point-in-time view of Microsoft Corporation on the issues discussed as of the date of publication.
The first part of the structure contains field codes which instruct Word to insert text into the second part of the structure, the field result. Fields in Word are used to insert text from an external file or to quote another part of a document, to mark index and table of contents entries and produce indexes and tables of contents, maintain DDE links to other programs, to produce dates, times, page numbers, sequence numbers, etc.
There are 91 different field types. A field begin mark delimits the beginning of a field and precedes any of the field codes stored in the field. The end of the field codes and the beginning of the field result is marked with the field separator and the field result and the field itself are terminated by a field end mark.
The CP locations of the field begin mark, field separator, and field end mark are recorded in plcfld data structures that are maintained for the main document and all of the subdocuments of the main document whenever a field is inserted or edited. A field can be dead, in which case it has no field separator, no field result, and no entry in the plcfld.
See the definition of the FLD structure for a list of possible dead field code strings. An array of two-byte FLD structures is stored in the plcfld in a 1-to-1 correspondence with the recorded CP entries. An FLD associated with a field begin mark records the type of the field. An FLD associated with the field end mark records the current status of the field i.
Fields may be nested. Twenty 20 levels of nesting are permitted. FKP Formatted disK Page : A data structure that fits in one byte page that encodes either the character properties or the paragraph properties of a certain portion of a Word.
An FKP consists of four components: 1 a count of the number of runs or paragraphs described by the page. Each BX begins with an offset that locates the properties of the paragraph that begins at a particular FC.
Then search through the bin table for the type of property you want to produce, to find the FKP in the document stream whose array of FCs encompasses the FC of the document character. Add this offset to the beginning address of the FKP in memory. The text stream of a non-complex file can be described by an fc an offset from the beginning of the file to mark where the text begins and a ccp count of CPs to record how many characters are stored in the text stream.
However, a full-saved piece table will not have property modifiers prms and all text in the file is referenced by the piece table. The 0th sprm is recorded at offset 0 of the structure. Any succeeding sprms are recorded immediately after the end of the preceding sprm. OLE 2. Only main documents and header documents contain Office Drawing objects. The native data for an Office Drawing object may be obtained by taking the CP for the special character and using this to find the corresponding entry in the plcspa.
An entry in this plc consists of a FSPA structure, which is described elsewhere in this document. Office Drawing objects can have text attached to them. Text for the textboxes is stored separately in the textbox subdocument of the main or header document.
Textboxes can be linked in chains of up to 32 textboxes. Ordering of textboxes in the subdocument is completely unrelated to the document structure due to the nature of textbox linking. This contains an index itxbxs into plctxbxs and a sequence number in the chain of linked textboxes. So, for each entry in the plctxbxs there is a corresponding entry in the plctxbxBkd at the same CP, and there may be additional entries in the plctxbxBkd to describe the breaks from one textbox to the next in linked textbox chains.
In Word data structures, an unsigned two-byte integer page number is given the acronym PN for Page Number. The PAPX contains an ISTD a style code to identify the style in control of the paragraph and a grpprl which specifies how the style's paragraph properties must be changed to produce the paragraph properties of the paragraph. A paragraph style provides a set of character and paragraph property defaults for the text of any paragraph tagged with that style.
When a new paragraph is created and given a particular style, newly typed text is set to the character and paragraph properties of that style unless the user makes an exception to the paragraph style definition by performing other editing operations. The fcPic is a byte offset into the data stream.
Beginning at the position recorded in chp. If the picture is an Office shape, a Window's metafile or a bitmap, the shape, metafile or bitmap will immediately follow the PIC.
Pictures that are a reference to an Office shape file will include both the filename and the shape in that order. Pictures inserted with Word 97 and later versions are in the new Office shape format documented elsewhere. However, pictures can be copied from older files into newer ones and their old format will persist until the picture is edited or displayed.
See Appendix B for a discussion of this technique. The array of CPs in the plcfpcd defines a partitioning of the Word document into disjoint pieces. The second array is an array of PCDs Piece Descriptors which is in 1-to-1 correspondence to the array of CPs that records the physical location in the Word file where the corresponding piece begins.
To find the physical location of a particular logical character in a Word document, take the CP coordinate of that character within the document and find the piece that contains that character. Finally, add the offset of the desired character from the beginning of its piece to the FC of the beginning of the piece.
If the second most significant bit is clear, then this indicates the actual file offset of the Unicode character two bytes. If the second most significant bit is set, then the actual address of the codepage compressed version of the Unicode character one byte , is actually at the offset indicated by clearing this bit and dividing by two. The lengths of the data structures stored in PLCFs within Word files are listed later in this document.
PLF PLex stored in File : A data structure consisting of an array of structures preceded by a long count of structures. If the user has made only a small change to formatting that can be expressed as a single 1 or 2-byte sprm, that sprm is stored within the prm. A single run may cross paragraph boundaries and may encompass the entire document.
Users frequently treat sections as the equivalent of a chapter in a book. The boundaries of sections mark locations where the layout rules for a document number of columns, text of headers and footers to use, whether page numbers should be displayed, etc. The array of CPs in the plcfsed records the boundaries of sections in the Word document.
If the FC stored in a SED is -1, the section properties of the section are exactly equal to the standard section properties. Use this index to locate the SED in the plcfsed which describes the section. It consists of an operation code which identifies the field s to be changed, and an operand which gives the value that a particular field is changed to or a parameter passed to a procedure to change the field or fields.
A prl property modifiers stored in a list is a sprm plus its operand. Every PAPX for every paragraph recorded in a document contains an ISTD which identifies the style from which a paragraph inherited its default character and paragraph properties. STTBFs consist of an optional short containing 0xFFFF, indicating that the strings are extended character strings, a short indicating how many strings are included in the string table, another short indicating the size in bytes of the extra data stored with each string and each string followed by the extra data.
Non-extended character Pascal strings begin with a single byte length count which describes how many characters follow the length byte in the string. Extra data associated with a string may also be stored in an sttbf.
Extended character strings are stored just the same, except they have a double byte length count and each extended character occupies two bytes. Each subdocument has its own CP coordinate space.
In other words, data structures are stored in Word files that are components of these subdocuments. In full-saved documents, a simple calculation with values stored in the FIB produces the file offset of the beginning of the subdocument text streams if they exist. The length of these streams is also stored. In fast-saved documents, the piece tables of subdocuments are concatenated to the end of the main document piece table. In this case, to identify the beginning of subdocument text, you must sum the length of the main document text stream with the lengths of any subdocument text streams stored ahead of the subdocument information stored in the FIB and treat this sum as a CP coordinate.
To retrieve the text of the subdocument, you must do lookups in the piece table, starting with the piece that contains the beginning CP coordinate, to find the physical location of each piece of the subdocument text stream.
The last paragraph of each cell is terminated by a special paragraph mark called a cell mark. Following the cell mark that ends the last cell of a table row, the table row is terminated by a special paragraph mark called a row mark. When Word displays a table row, it assigns a rectangular shaped display area to each cell in the row.
The leftmost display area in a table row is assigned to the 0th cell of the row; the next display area to the right is assigned to the 1st cell of the row, etc.
The text of the cell is wrapped to fit its display area. As more text is added to the cell, the cell display area extends downward. A set of table properties that determine how many cells are in a row, where the horizontal boundaries of cell display areas are, and what borders are drawn around each cell in the table is stored for the row mark that marks the end of the table row.
The information in the TAP for a table row is stored in a Word file as a list of sprms that modify a TAP which has been cleared to zeros. This list of table sprms is appended to the grpprl of paragraph sprms that is recorded in the PAPX for the row mark that delimits the end of a table row. Note In this document, bit 0 is the low-order bit.
Structures are described as they would be declared in C for the Intel architecture. When numbering bytes in a word from low offset towards high offset, two-byte integers have their least significant eight bits stored in byte 0 and most significant eight bits in byte 1. If bit 31 is the most significant bit in a four-byte integer, bits 31 through 24 are stored in byte 3 of a four-byte integer, bits 23 through 16 are stored in byte 2, bits 15 through 8 will be stored in byte 1, and bits 7 through 0 are stored in byte 0.
Naming Conventions The field names in Word data structures usually consist of a prefix of lower case characters followed by an optional upper case modifier. The following tags are used in the lower case prefix of field names to document the data type of the field: b Used to name a 1 byte integer value c Prefix used to signify that an integer value is a count of some number of objects.
Always a 4 byte quantity. The two following modifiers are used occasionally in this documentation: First Means that the variable marks the first of a range of objects.
For example, cpFirst would mark the first character position of a range of characters in a document. Lim Means the variable marks the limit of a range of objects i.
For example, cpLim would be the limit CP of a range of characters in a document. SummaryInformation and DocumentSummaryInformation are widely understood. FIB Stored at the beginning of page 0 of the file. Text of body, footnotes, headers Text begins at the position recorded in fib. Previous versions of Word wrote them in contiguous blocks.
SEPXs are no longer guaranteed to start on a page boundary if it would span a boundary when placed immediately after the preceding SEPX. FIB Stored at beginning of page 0 of the file. Text of body, footnotes, headers stored during last full save Text begins at the position recorded in fib.
Ordinarily a file will contain only one table stream. However, in some unusual circumstances e. This field only appears in auto saved files. These files are normal Word documents in every other way. For example, an auto saved file is typically longer than the equivalent Word document. This is recorded in all Word documents. Format is described in the Office drawing group format document. This is recorded in all Word documents formFldSttbs form field dropdown string tables Written immediately after the previously recorded table, if the document contains form field dropdown controls.
This undocumented structure maps LID and grammar checker type to grammar checking options. This is immediately followed by the allocated data hanging off the LSTFs. Only written during a fast save.
Recorded in all Word documents plcfspl spelling state table Written immediately after the previously recorded table. This is a string table containing the list names for each list. It is parallel with the plcflst, and may contain null strings if the corresponding LST does not have a list name. The sttbfffn is an sttbf where each string is instead an FFN structure note that just as for a Pascal-style string, the first byte in the FFN records the total number of bytes not counting the count byte itself.
The names of the fonts correspond to the ftc codes in the CHP structure. Format of the Data Stream embedded objects-native data Word embedded object structures are sequentially concatenated if the document contains embedded objects. Within this fstorage, zero or more custom XML parts can exist each in their own storage. Each of these storages is stamped with a unique identifier as its storage name. An instance of one of these storages contains two streams within it: 1. A stream named item 2.
FIB The FIB contains a "magic word" and pointers to the various other parts of the file, as well as information about the length of the file.
The FIB starts at the beginning of the file. The FIB is defined in the structure definition section of this document. Text The text of the file starts at fib. No other occurrences of this character sequence are allowed. Other line break or word wrap information is not stored. The following ASCII codes are treated as "special" characters when they have the character property special on chp.
Note The end of a section is also the end of a paragraph. The last character of a section is a section mark which stands in place of the paragraph mark normally required to end a paragraph. An exception is made for the last character of a document which is always a paragraph mark although the end of a document is always an implicit end of section. Otherwise, the document is represented by the piece table stored in the file in the data beginning at fib.
The document text stream includes text that is part of the main document, plus any text that exists for the footnote, header, macro, or annotation subdocuments. The sizes of the main document and the header, footnote, macro and annotation subdocuments are stored in the fib, in variables: fib. Character and Paragraph Formatting Properties Character and paragraph properties in Word documents are stored in a compressed format. The information stored on disk is not the properties of a particular sequence of text but the difference of the properties from a specific reference property.
The PAP is a data structure that holds uncompressed paragraph property information; the CHP pronounced "chip" is a structure that holds uncompressed character property information.
Each paragraph in a Word document inherits a default set of paragraph and character properties from one of the paragraph styles recorded in the style sheet data structure STSH. A particular PAP is converted into its compressed form, the PAPX, by first comparing the pap for a paragraph with the pap stored in the style sheet for the paragraph's style.
Any properties in the paragraph's PAP that are different from those stored in the style sheet PAP are encoded as a list of sprms grpprl. It contains an istd index to style descriptor which specifies which style entry in the style sheet contains the default paragraph and character properties for the paragraph, paragraph height information, and the list of difference sprms.
If the only difference between the paragraph's PAP and the style's PAP were in the justification code field, which is one byte long, one two-byte sprm, sprmPJc, would be generated to express that difference; thus the total PAPX size would be 5 bytes. This is better than compression since the total size of a PAP is bytes. To convert a CHP for a sequence of characters contained within a single paragraph into its compressed form, the CHPX, it's first necessary to know the paragraph style assigned to the paragraph containing those characters and any character style that may be tagging the character run.
The character properties inherited from the paragraph style are moved into a buffer. If the chp. Any properties in the paragraph's CHP that are different from those stored in the generated CHP are encoded as a list of sprms grpprl.
The sprms express how the content of the CHP generated from the paragraph and character styles should be transformed to create the character properties for the text run. If one of the bit fields in the CHP to be compressed such as fBold is different from the reference CHP, you would build a difference sprm using sprmCFBold in the first byte and the bytes pattern 0x81 in the second byte which signifies that the value of the bit in the CHP to be compressed is of opposite value from the value stored in the reference CHP.
If there was no difference, sprmCFBold would not be recorded in the grrprl to be generated. If there were a difference in a field larger than a single bit such as the chp. If a sequence of characters has the same character properties and the sequence spans more than one paragraph, it's necessary to examine each paragraph's properties and to generate a different CHPX every time there is a change of style.
In Word documents, the fundamental unit of text for which character exception information is kept is the run of exception text, a contiguous sequence of characters stored on disk that all have the same exception properties with respect to their underlying style character properties. If a user never changed the character properties inherited from the styles used in the document and did a complete save of the document, although each of those styles may have different properties, the entire document stream would be one large run of exception text and one CHPX would suffice to describe the character properties of the entire document.
The fundamental unit of text for which paragraph properties are recorded is the paragraph. An FKP is a byte data structure that is stored in one page of a Word file.
This byte array, named rgb, is in 1-to-1 correspondence with the rgfc. This array called the rgbx is in 1-to-1 correspondence with the rgfc. Word uses this optimization. An rgb or rgbx[]. When an rgb or rgbx[]. For CHPX FKPs a 0 rgb value means the properties of the run of text were exactly equal to the character properties inherited from the style of the paragraph it was in.
The new FC is added at the end of the rgfc. Bin Tables A bin table plcfbte partitions the total extent of the Word file that contains text characters into a set of contiguous intervals marked by an fcFirst and an fcLim. The fcFirst for the nth interval would be plcfbte. Associated with each interval is a BTE. Even though a sequence of text may be between two paragraph end marks, it may reside in a paragraph different from the one defined by the next paragraph end mark, because the text may have been moved by the user into a different paragraph.
In the logical text stream represented by the document's piece table, the paragraph mark that follows the moved text is stored in a non-adjacent physical location in the file.
Style Sheet A style sheet is a collection of styles. In Word, each document has its own style sheet. A style is a collection of formatting information with a name. Word 6. Versions of Word prior to 6. Character styles have just character formatting. Paragraph styles have both character and paragraph formatting. The style sheet establishes a correspondence between a style code and a style definition. Note: the storage and behavior of styles has changed considerably since WinWord 2. The range of the stc was , with as the null style.
The styles for a document both paragraph and character styles are stored in an array in each document. The array can have unused slots. Some slots at the beginning of the array are reserved for specific styles, whether they were created yet or not. Istd are Heading Istd 10 is Default Paragraph Font. Istd are reserved. So the first non-fixed index is 15 see stshi. Each document has a separate array, so the same style will usually [Those styles in fixed locations in the style sheet will have the same istd's in all documents] have a different istd in two different documents.
Thus style matching between documents must be done by name or by sti if the styles are built-in. Styles are usually referred to using an istd. A doc, istd pair uniquely identifies a style because it tells which style is in which array. Built-in styles have a unique sti to indicate which built-in style they reference. User-defined styles use stiUser.
Every paragraph has a paragraph style. Every character has a character style. The default paragraph style is Normal stiNormal, istdNormal. The formatting of a paragraph the PAP and a character the CHP depend on the paragraph and character styles applied to them, as well as any additional formatting stored in the FKPs.
For a CHP: 1. Properties from the character's style the UPX. The STSHI contains general information about the following style sheet, including how many styles are in it. The cbStshi to use for those file versions is 4 bytes. Then for each style in the style sheet stshi. The current definition of the STSHI structure might be longer or shorter than that stored in the file, the style sheet reader routine needs to take this into account. There will be stshi.
Note: styles can be empty, i. The stshi. If the STD base is grown in a future version, the file format doesn't change, because the style sheet reader can discard parts it doesn't know about, or use defaults if the file's STD is not as large as it was expecting. Currently, stshi. Note: the built-in style names may need to be "regenerated" if the file is opened in a different language or if stshi. This indicates the number of fixed-index positions reserved in the style sheet when it was saved.
If not, the built-in style names need to be "regenerated", i. See notes on sprmCRgftcX for details. Introduced in Word stshi. The index into mpstilsd corresponds to the index of the style that the LSD structure affects see std. A cb of zero indicates an empty slot in the style array, i. Because the DOC file format was a closed specification for many years, inconsistent handling of the format persists and may cause some loss of formatting information when handling the same file with multiple word processing programs.
Some specifications for Microsoft Office 97 binary file formats were published in under a restrictive license, but these specifications were removed from online download in Sun Microsystems and OpenOffice. Some historical documentations may use the DOC filename extension for plain-text file format.
The DOC filename extension was also used in historical versions of WordPerfect for its proprietary format. Some software applications use the name "DOC" in combination with other words such as the name of software manufacturer for different file formats. Penerimaan Mahasiswa Biaya Pendidikan.
Chat WhatsApp.
No comments:
Post a Comment