HISTORY.TXT (for DosLynx v0.33b, November 2006) by Fred C. Macall Introduction This is for my twelfth release of DosLynx. In this document, I'll try to take up where I left off in the last history.txt and not rehash the history of the previous versions. However, there is a cumulative list of unresolved bugs, issues, and To Do(s) at the end. If you want to know everything else there is to know about my previous releases of DosLynx, you can get most of that history from: http://users.ohiohills.com/fmacall/dlx2xdoc.zip . This archive collects the info.htm and history.txt documents for all eight of my DosLynx v0.2xb releases. For the rest, the documents for DosLynx v0.30b, v0.31b, and v0.32b are available at: http://users.ohiohills.com/fmacall/dlx30/info.htm http://users.ohiohills.com/fmacall/dlx30/history.txt http://users.ohiohills.com/fmacall/dlx31/info.htm http://users.ohiohills.com/fmacall/dlx31/history.txt http://users.ohiohills.com/fmacall/dlx32/info.htm and http://users.ohiohills.com/fmacall/dlx32/history.txt . Protected Mode Version breakout( ) Call Upon Showing a Long URL MYALLOC.CPP, introduced in the history.txt for DosLynx v0.31b, contains a realloc( ) function consisting of only a wrapper for a breakout( ) call. It is included in the DosLynx Protected Mode version, but not in the Real Mode version. The idea is that all the realloc( ) calls in DosLynx should have been replaced with caprealloc( ) calls. Well, when I worked on MYALLOC.CPP, I had already forgotten a realloc( ) call I added to TURLView::handleEvent( )'s case cmShowDestination, in TURLVIE9.CPP. I think that addition came in the course of implementing the Navigate|Show Destination URL (Alt-U) dialog's Paste button. That work was reported under the heading: A Second Paste Button, in the history.txt for DosLynx v0.29b. The troublesome realloc( ) call was intended to resize a memory allocation containing a copy of the URL string being reported. After truncating a long URL string, to 250 octets. This wasn't a significant issue with the DosLynx Real Mode version, because it contains the realloc( ) function supplied by the Borland C/C++ compiler's runtime library. I've replaced that overlooked realloc( ) call with the caprealloc( ) call that I should have used in the first place. Bad /P Command Line Parameter Value Triggered a Protection Violation When given a bad /P command line parameter value, the DosLynx Protected Mode version sometimes tried to use an invalid selector number, in tcp_config( ) in WATTCP module PCCONFIG.C. This occurred because tcp_config( )'s (auto) char * temp variable would get used without having been set, in that case. The same problem probably occurred, but tended to go unnoticed, in the DosLynx Real Mode version. I've resolved this issue by moving the setup, for tcp_config( )'s temp variable, to follow the else clause that previously contained it. HTTP Location: Header Field Recognition Failure HTLoadHTTP( ), in HTTP.C, wasn't case insensitive when it looked through received Headers for a Location: field, after receiving a 3xx (redirect) response. This caused it to fail to find the location: field that at least one Web server likes to use. I've extended HTLoadHTTP( ) to make it completely case insensitive in its search for a Location: Header field. Missing HTML Tag Caused Unexpected Error in Formatting HText_beginAnchor( ), HText_beginSelect( ), and HText_endForm( ) all contain precautionary calls to HText_endSelect( ), for dealing with missing HTML tags. These functions are all defined in GRIDTEXT.CPP. However, these precautionary calls aren't sufficient to take care of the exception(s) they were intended to handle. Because, a call to HText_endSelect( ), by itself, is no longer sufficient to signal the end of a tag is reached. This has worked out as well as I imagined it would, with one big exception. The work done for DosLynx v0.27b only provided for deferring the processing of a single cpimg and silp->HTCApimg. It then uses a capcalloc( ) call to construct another empty struct imglst object and links it into silp->next. HTML_endAnchor( ) contains several HText_endAnchor( ) calls and is designed to replace an HText_endAnchor( ) call, in the six places in HTML.C, where that is appropriate. Of course, case HTML_A in HTML_end_element( ), in HTML.C, is one of those six places. freeimglst( ) gets called from the two places, in HTML.C, where lastimg used to get cleared. Perfected Support for the ISO-8859-1 (8-bit) Character Set I have been cataloging this issue as: - The handling of special characters seems to need work. There never seemed to be time for this issue until a Protection violation arose, in the DosLynx Protected Mode version, from UnEscapeEntities( ) in HTML.C. This protection violation came as a result of nonsense being performed in the name of processing numeric character entities. Upon having my attention so forcefully drawn to that nonsensical processing, I decided that the time had come to tackle this issue. Let me recount the ways I found that DosLynx was deficient in its handling of special characters: By "special characters", I mean the 96 characters that ISO-8859-1 adds to the ASCII (7-bit) set. The nonsensical handling of numeric character entities, in UnEscapeEntities( ), was only the first of these. By "numeric character entities", I mean sequences like   By "character entity names", I mean sequences like   By "character entity", I mean either a numeric character entity or a character entity name. The provisions for handling character entity names, in UnEscapeEntities( ) and in SGML_character( ) in SGML.C, made some sense. But, these still had issues, too. First, the translate tables: static CONST char * entities[ ], in HTMLDTD.C, and static char * ISO_Latin1[ ], in HTML.C, contained entries for only about half of the one hundred plus entity names needed. The entries in ISO_Latin1[ ] generally provided a translation to the PC's Code Page 850 encoding. However, no such translation was provided for any of an HTML document's other characters. The nonsense section for processing numeric character entities in UnEscapeEntities( ) seemed to take a stab at checking for out of range numeric values. Meanwhile, the more reasonable counterpart code in SGML_character( ) had no such check. And, all four of the character entity handling sections had their own individual ways for dealing with apparent entity sequences ending in a character other than semi-colon! I started work on this project by filling out entities[ ], with an additional entry for each of the special characters that hadn't been represented before. Since entities[ ]'s name entries are alphabetically ordered and have to be matched by at least one other table, I also added a sequence number and ISO-8859-1 encoding value to the comments for all entries. Next, I needed to update ISO_Latin1[ ] to match the updated entities[ ]. Here, I also decided to remove the translation to PC code. That translation has been replaced as described below. Without translation, ISO_Latin1[ ] needs just a single character per entry. So, I've replaced it with the new array static unsigned char ISO8bit[ ], which has all new initialization. Next, I took the nonsense out of the numeric character entity handling part of UnEscapeEntities( ). At this point, I also decided to pass all unrecognized or out-of-range entity sequences through without change. I also decided on a consistent handling for the last character of accepted character entity sequences. It won't be required to be a semi-colon. It now may be any non-alphanumeric character. Then, assuming an accepted character entity value, if the last character isn't &, <, or >, it will be consumed by entity processing. If it is one of the three characters mentioned, it will be left for reprocessing as the next input character. I also revised both of SGML_character( )'s character entity handling sections, to match these rules. There are two notable results from all the changes discussed above. First, one now observes a much lower rate of character entity name sequences going unrecognized. This, of course, is mostly attributable to having added entries, for about fifty missing entity names, to the tables. However, in a few documents, no longer requiring some character entity sequences to end in a semi-colon is also a factor. On the other hand, one now observes a much higher rate of apparent numeric character entity sequences going unaccepted or unrecognized. This is because DosLynx is now rejecting, and passing through, all apparent numeric character entity sequences having values above 255. This is something it did rarely, if ever, before. Many of the apparent numeric character entity sequences now being rejected seem to represent a relatively small sample of defined Unicode characters. Characters like apostrophes, bars, dashes, and left and right quotes. DosLynx still has no way to handle Unicode characters. But, I am thinking of having it provide ISO-8859-1 approximations for the more frequently seen numeric Unicode character entities. At about here, translation from ISO-8859-1 encoding to PC code became the last main issue to be treated, in this area. I decided to include this translation function in HText_appendCharacter( ), in GRIDTEXT.CPP. This translation has to be to Code Page 850. Because none of the other Code Pages defined for DOS carry all of the ISO-8859-1 characters. This is unfortunate because the DOS default or hardware Code Page in english speaking countries tends to be Code Page 437. To help a little with this, I've added a section entitled: Code Page 850 Cook Book, to the DosLynx README document. This new section provides a fairly simple recipe for changing one's DOS set up, for the PC's console or display, to use the Code Page 850 character set. However corrections for the problems identified above might have been structured, an additional translate table would need to be added to the original entities[ ] and ISO_Latin1[ ] pair. Given the choices I had already made, neither entities[ ] nor ISO8bit[ ] are relevant to translation from ISO-8859-1 encoding to PC Code Page 850. So, I needed to introduce a new translate table for HText_appendCharacter( )'s use. This new table, which doesn't need to correspond with entities[ ], is located in GRIDTEXT.CPP. I had seen a surprisingly large reduction, in the size of DOSLYNX.EXE, result from my replacement of static char * ISO_Latin1[ ] with static unsigned char ISO8bit[ ]. (The .EXE size change was around eight to ten octets per ISO_Latin1[ ] element. Rather then the five octets per element, for each element's pointer and string ending null, that I had initially expected. I now suppose that there must have been four additional octets needed, per element, for initializing or relocating the pointers that ISO_Latin1[ ] included.) So, I was inclined not to make this new table follow the example of ISO_Latin1[ ]. A study of ISO_Latin1[ ]'s entries made this choice clear. Only two or three of its entries were anything other than single character strings. Those few longer entries could be handled by special casing in HText_appendCharacter( ). So, the new translate table may as well be a char array. I've declared it const char ISO2pccp[ ]. ISO2pccp[ ] provides for translating only the 96 characters that ISO-8859-1 adds to ASCII. That is, characters in the range of   to ÿ The ASCII (7-bit) characters are well matched by the PC's Code Pages, including Code Page 850. So, the ASCII characters don't need translation. And, except in their most significant bit, the characters in the range of € to Ÿ correspond to the ASCII control characters. They have been left undefined by the HTTP, HTML, and/or ISO-8859-1 specification(s). So, except for —, I've left them to be defined by the DOS Code Page in use, as well. Normally, that will be Code Page 850. Right? A special case in HText_appendCharacter( ) grandfathers —, as an EM Dash, and translates it to Ä Fortunately, Ä is also an EM Dash in Code Page 437. Putting translation into place, in HText_appendCharacter( ), broke the Horizontal Rules. To fix them, I replaced the '-' (Ä) literal in case HTML_HR, in HTML_start_element( ) in HTML.C, with '\227' (—). I wasn't able to find any other PC coded literal(s), like this one, in HTML.C. Finally, having seen the big reduction in the size of DOSLYNX.EXE that came from replacing ISO_Latin1[ ] with ISO8bit[ ], I decided to change static CONST char * entities[ ] to static CONST char entities[ ][8]. This did yield another reduction in the size of DOSLYNX.EXE. But, not such an impressive reduction. Because, the average length of the entities strings increased from about six to eight, in this change. This change involved a matching change, to the definition of entity_names, in typdef struct { . . . } SGML_dtd in SGML.H. entity_names is now declared as CONST char (* entity_names) [8], there. This is an instance of tricky C parenthesization. entity_names is now defined as a pointer to char . . . [8], rather than as an array of pointers to char. Next, the initial data value given for number_of_entities, in PUBLIC CONST SGML_dtd HTML_dtd in HTMLDTD.C, needed restatement to size(entities)/8. And, of course, the actual users of entities[ ][8] needed adjustments for this change. These are UnEscapeEntities( ), in HTML.C, handle_entity( ), in SGML.C, and HTMLGen_put_entity( ), in HTMLGEN.C. DosLynx now consistently translates all of its input from ISO-8859-1 encoding to DOS Code Page 850, in its document presentations. (Meanwhile, its three Save Source and Download type commands remain transparent. They don't do any translation on the data they process.) So, one's viewpoint naturally adjusts to expect the full 8-bit character set's handling to always make sense. From this viewpoint, the File|Save Rendering... (F2) menu entry or command now presents a dilemma. Should this command write its file with ISO-8859-1 encoding or with Code Page 850 characters? Or, optionally, either? The trouble with writing using ISO-8859-1 encoding is that other PC utilities, such as the TYPE or PRINT commands, may not have any way to correctly handle that encoding. On the other hand, writing using the Code Page 850 character set yields a file that DosLynx can't correctly read-back! I've taken the course of least effort, for now. The File|Save Rendering... (F2) menu entry or command will (continue to) write its file with Code Page 850 characters. That will preserve this file's suitability for PC utilities that don't know ISO-8859-1 encoding. However, as explained above, DosLynx doesn't have any way to correctly read-back or import such a file, for now. This arrangement adds another reason for using one of the three DosLynx Save Source and Download type commands. When a file is desired for later viewing with DosLynx. These commands are: File|Open URL... (F3, with its Download button), File|Save Source (Alt-S), and Navigate|Download Selection (Alt-D). I welcome suggestions for resolving this issue. Parting Shot While testing HTML examples containing some of the five newly supported HTML tags discussed above, the DosLynx Protected Mode version demonstrated another Protection violation. This one came from HTML_put_value( ), in HTML.C. HTML_put_value( ) was introduced in the history.txt for DosLynx v0.26b. The trouble was that HTML_put_value( ) was being called with zero for its CONST char * cpstr parameter, from case HTML_AREA in HTML_start_element( ) in HTML.C. The example being tested just then contained an HTML directives. And/or, provide a menu entry or command for designating all cached document(s) "stale". (Bypass: Use the Navigate|Reload Current menu entry or command, when necessary.) - Find a way for the Protected Mode version to successfully support the stock Microsoft mouse driver in Windows 2000 windows. (This isn't an issue with the DosLynx Real Mode version. Bypass: Use mouse= configuration to disable support of the mouse, where necessary.) - Find a graceful way for the FTP client to deal with directory reports that are too large to be memorized. (This is hardly an issue with the DosLynx Protected Mode version.) - Find a way for the File|Save Source and Navigate|Download Selection menu entries to handle FTP directory reports that are too large to be memorized. (This is hardly an issue with the DosLynx Protected Mode version.) - When your display is set in a VESA mode upon running DosLynx, its arrangement(s) for saving and restoring the screen mode may break down. (Bypass: Set your display into a "legacy mode" before running DosLynx.) - Provide for displaying directory entry details, beyond file name, from the WAR-FTP Daemon's DOS style directory reports. - Provide a configurable "download directory". (Bypass: use the command line /P option (as demonstrated in DOSLYNX.BA_) and set your DOS default drive and path to specify the directory you want to download into.) - Provide a "disk view" number at the bottom of the screen that isn't limited to the 2.1 GB maximum size of a DOS FAT16 partition. So, the free space available for a DosLynx temporary directory in a large DOS FAT32 partition won't be under-reported. (Presently, the disk view number stays at or about 2147418112 as long as the large partition has at least that much space free.) I hope you'll let me know what you think about the way the DosLynx Protected Mode version has turned out. Or, if you have any comment(s) or question(s) in any connection with DosLynx v0.33b! As I don't have a '286 based PC, presently, I am especially interested in hearing from anyone who is able to try the DosLynx Protected Mode version on such a system. fcm