This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
When solving #98966, I found following problem. Schliemann parser seems not to support a usecase where a language embedding is split into more top level tokens. For example in HTML, one block comment spanning over multiple lines is split by each new line because of some performance reasons. So if someone uses <script type="text/javascript> <!-- function my() { alert("hi"); } --> </script> I need to create the javascript embedding for all 5 html block comment tokens (the first and last uses skip lenghts to skip the delimiters). However Schliemann parser seems not to parse these pieces as one block of javascript but as five separate javascript pieces, which leads to not properly working features, showing syntax errors etc. IMO, Schliemann should joint the embeddings pieces and parse them as one single block of javascript. The same problem can of course happen in other languages, html is just an example to illustrate.
The problem is not directly in the parser, but rather in the lexer, which lexes the embeedded parts separately for each high level token. I have filled an issue #99664 which being fixed should help.
So, I am closing this issue. We are not able to parse language splitted to more tokens.
Hanzi, I still think you will have to do some changes after Mila fixes the issue #99664. So why not to let the issue opened? It has the dependency, so you can substract it from the overal issues number if anyone cares about bug numbers. The same applies for the other issue you closed as wontfix. Opinion?
I do not know about such changes, I thought that it will be fixed in Lexer without any changes needed on my side. Feel free to fire a new issue after that with some explanation what should be fixed, please. The issue in current state is not fixable for me.
Yes, the issue is not fixable now, and yes, in this simple example it will probably work without touching your code. But IMO you will need to use the TokenSequenceLists to walk throught the separated sections of the same language once Mila fixes it (issue #95569). The case reported in this issue could probably be also fixed by fix of #87014. Mile, can you express your expert opinion?
I have changed the summary to better describe the problem. So the main problem is that an embedded language can be split into more tokens over the document. The high level tokens containing the embedding MAY be in a row following each other (1) or they can be divided by other tokens (2). We need to cover both situations. There is an issue #87014 - Preserve lexer state between separate blocks of embedded language which being fixed will automatically fix the case #1. Mila works on the 'TokenSequenceList' Lexer API extenstion which allows to iterate over spread tokens of one language in a row - Issue #95569. Once this is fixed, you need to use this API to fix #2.
*** Issue 100642 has been marked as a duplicate of this issue. ***
*** Issue 101119 has been marked as a duplicate of this issue. ***
*** Issue 101273 has been marked as a duplicate of this issue. ***
*** Issue 101746 has been marked as a duplicate of this issue. ***
*** Issue 101861 has been marked as a duplicate of this issue. ***
It causes hight visible issues like issue 100607 - start/end tag matching, problems with navigator, etc. And number of duplicates is big. Increasing priority to P1.
The issues is considered as M9 stopper so please fix it ASAP.
*** Issue 101938 has been marked as a duplicate of this issue. ***
*** Issue 101878 has been marked as a duplicate of this issue. ***
It looks like some misunderstanding. Parsing embedded sections as one piece is RFE. It may be implemented, but it needs some prototyping. But it does not block HTML of JSP implementation as far as I know. 1) HTML Given usecase with script tag should be fixed in HTML lexer - html/editor module. You should parse script body as one token. Its not possible to fix this issue on my side. Its possible to implement html lexer/parser without any changes in languages/engine module. That is proved by languages/html module implementation. 2) JSP It should be possible to implement JSP lexer/parser based on current Schliemann engine too. Just use HTML language as top level laguage for JSP and mark all JSP blocks as HTML whitespaces. The same technique should be used for rhtml. We can discuss it more on Monday.
After my one-to-one meeting with Hanz, the resolution is so far: #1 is mostly about issue #87014, and yes, I can workaround it. If Mila didn't fix the issue, I'll do that to M9 #2 The proposed solution with switching the JSP and HTML languages in terms of embedding would fix the html tag pairing problems, but would introduce the same problems in JSP. So it look like a not ideal solution. After a discusstion with Hanz, it look like the correct solution is to parser all the embedded languages separately (all joined pieces at onece), create a separate ASTs for them and let all the features to work on their own AST. For features like navigator which needs to work with all AST, merge them somehow reasonably (there seems to be some unsolvable problems with crossed tags - folding, navigator). The solution with more ASTs is too complicated to be done in M9, so for the milestone, I'll try to workaround the problem by resolving the HTML AST items in JSP's AST resolver. However, I am not sure now, if it will work.
Implementation of this feature for M9 is risky, and we probably do not have enough time anyway.
*** Issue 101936 has been marked as a duplicate of this issue. ***
I have implemented a workaround solution of this issue for milestone 9. The JSP AST resolver now collects all the html pieces and joins them into one AST and let it processed by the html AST resolver. The result is that the JSP document's navigator contains a view of the html content of the file, JSP nodes are not shown. The problems with incorrectly marked unpaired tags is fixed. I'm downgrading the issue to P2 since it is not that urgent for M9 now. I expect that if this issue is fixed properly in M10, I'll remove the workaround. Modified: languages/engine/src/org/netbeans/modules/languages/parser/LLSyntaxAnalyser.java web/jspsyntax/src/org/netbeans/modules/web/core/syntax/JSP.java html/editor/src/org/netbeans/modules/html/editor/resources/HTML.nbs ide/golden/group-friend-packages.txt ide/golden/friend-packages.txt html/editor/src/org/netbeans/modules/html/editor/HTML.java html/editor/nbproject/project.xml Log: A temporary M9 solution for #99526 - Parser doesn't parse separated embedded sections of one language as one piece
*** Issue 102373 has been marked as a duplicate of this issue. ***
fixed in trunk: IDE: [5/21/07 6:26 PM] Committing "Generic Languages Framework" started Checking in modules/languages/parser/LLSyntaxAnalyser.java; /cvs/languages/engine/src/org/netbeans/modules/languages/parser/LLSyntaxAnalyser.java,v <-- LLSyntaxAnalyser.java new revision: 1.35; previous revision: 1.34 done Checking in api/languages/ASTItem.java; /cvs/languages/engine/src/org/netbeans/api/languages/ASTItem.java,v <-- ASTItem.java new revision: 1.4; previous revision: 1.3 done Checking in api/languages/ASTNode.java; /cvs/languages/engine/src/org/netbeans/api/languages/ASTNode.java,v <-- ASTNode.java new revision: 1.11; previous revision: 1.10 done IDE: [5/21/07 6:26 PM] Committing "Generic Languages Framework" finished