This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
The LexerInput for the embedded lexer should contain characters from all the joined embedded sections. This is continuation of requests from issue 87014. The advantage will be that the embedded lexer will not need to be sensitive to "soft-EOFs" at the embedded sections boundaries and its internal state machine can be much simpler. The lexer infrastructure will perform the token splitting (so that the token hierarchy remains a tree of tokens). Also this request is must-have if there are tokens with larger lookaheads. It's not yet clear whether some of the lexers would benefit from being aware of the soft-EOFs. By default we will not add any API for that as it should be no problem to add it later.
This should certainly be done for NB7.0 since there are features that depend on this.
*** Issue 118892 has been marked as a duplicate of this issue. ***
I'm already working on this expecting to be finished within 6.1 M1. I'm working on implementation that will allow not only to join all the embedded sections together (e.g. all html sections in jsp) together but also create multiple subsections. There is currently no API for this but we already agreed with Marek that it will be needed anyway so the impl will be ahead.
There is still some remaining work to finish so I've changed the TM to 6.1M2. I hope to have a first version ready in about a week.
Finally I have the implementation complete. I would like to ask for API review since the change adds extra methods to Token that allow to find out whether a particular token is a join token or a token part. Once the change gets integrated the lexers for language embeddings with LanguageEmbedding.joinSections()==true will see all the sections with the particular language path as a one single continuous input. If the produced tokens split the boundaries of sections then the infrastructure will break the tokens into parts automatically. There should be no changes necessary in the lexers' implementations since the lexers should not care of how the characters in the LexerInput get composed. TokenSequence.embeddedJoined() allows to obtain token sequence over joined tokens. I'm still adding some extra tests including a test with random modifications of a document with joined embeddings and fixing some minor problems. I apologize for a significant underestimation of the necessary work. Finally I had to rewrite both the TokenListUpdater and TokenHierarchyUpdate almost completely separating analysis for regular and joined token lists (TLU.updateRegular() and TLU.updateJoined()) while keeping a common relexing part TLU.relex(). It was a biggest change in the lexer module since its introduction. OTOH there are some positive aspects: 1) The implementation of a JoinTokenList is rather lightweight. The tokens from individual EmbeddedTokenList members are not copied into a joined token list. Instead just the token parts split among multiple ETLs point to a special JoinToken and there is an extra EmbeddedJoinInfo that carries meta info necessary for efficient searching and iteration in a JoinTokenList implementation. 2) The implementation allows to maintain both single (current state) or multiple join token lists (having multiple separated "section groups") across the document to support a potential usecase expressed by web team. 3) I have improved performance of accessing of the text of embedded tokens (originally complained by Schliemann team). For a token implementing the CharSequence there is a certain overhead of acessing each character so for long tokens a inputText-CharSequence.subSequence() gets used to have the most direct access to the characters. 4) Token.text().toString() caching is implemented (originally requested by Schliemann). Although some toString() usages can be eliminated by using methods in TokenUtilities there may be legitimate usages e.g. for a use in HashMap. The current impl starts to cache text strings for longer tokens sooner than for the short ones. 5) Embedded token lists initialization should possibly be more deterministic (Lazy ETL.init() was eliminated). It could possibly eliminate the infamous "ISE: Removing at index=xx but real index is yy ..." which can sometimes still be seen. I also plan to transfer the read/write-locking checks (that can now be turned on by using on a logger) into assertions (that would be always checked except no-assertions FCS builds). But I first need to fix the problematic usages and also migrate the lexer's tests. 6) Token.isRemoved() is implemented which allows to determine whether a particular token was already removed from the token hierarchy or not. This is used both by the infrastructure but it's also useful externally when doing offset binary searches in an array of tokens where some of them may already be removed. Since the removed tokens retain their original offsets the binary search could run into an infinite loop. Token.isRemoved() allows to remedy the problem.
Created attachment 62062 [details] Diff of the proposed change ~660kB
I have added a JoinRandomTest to test the algorithm by random document modifications. It revealed >10 problems that I have fixed. I have tried opening/editing of several file types including java and jsp. I hope that the change should not introduce any significant regressions. changeset: 82742:f6b2e0181e07 tag: tip user: Miloslav Metelka <mmetelka@netbeans.org> date: Thu Jun 05 00:37:02 2008 +0200 summary: #117450 - Provide unified LexerInput across multiple joined embedded sections.
You seem to have broken commit validation due to an incompatible change in lexer.
I have added default bodies to the newly added methods: http://hg.netbeans.org/main/rev/5264dc709dd0
Test compilation for many modules also got broken due to the change in constructor signatures in TestRandomModify.
Apologies, since the constructor of Token is protected Token() { if (!(this instanceof org.netbeans.lib.lexer.token.AbstractToken)) { throw new IllegalStateException("Custom token implementations prohibited."); // NOI18N } } I've got a false assumption that I can add abstract methods to it. Thanks to Honza for fixing the problem. I have fixed the TestRandomModify: http://hg.netbeans.org/main/rev/ae1faae6f61b