99526 – Parser doesn't parse separated embedded sections of one language as one piece

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 99526 - Parser doesn't parse separated embedded sections of one language as one piece

Summary: Parser doesn't parse separated embedded sections of one language as one piece

Status:	RESOLVED FIXED

Alias:	None

Product:	obsolete
Classification:	Unclassified
Component:	languages (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	issues@obsolete

URL:
Keywords:

Duplicates (2):	101273 102373 (view as bug list)
Depends on:
Blocks:	98721 98966 99244 99509 100607 103589
	Show dependency tree

Reported:	2007-03-30 12:44 UTC by Marek Fukala
Modified:	2007-06-12 08:36 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marek Fukala 2007-03-30 12:44:31 UTC

When solving #98966, I found following problem.

Schliemann parser seems not to support a usecase where a language embedding is
split into more top level tokens. 

For example in HTML, one block comment spanning over multiple lines is split by
each new line because of some performance reasons. So if someone uses

<script type="text/javascript>
<!--
function my() {
   alert("hi");
}
-->
</script>

I need to create the javascript embedding for all 5 html block comment tokens
(the first and last uses skip lenghts to skip the delimiters).

However Schliemann parser seems not to parse these pieces as one block of
javascript but as five separate javascript pieces, which leads to not properly
working features, showing syntax errors etc.

IMO, Schliemann should joint the embeddings pieces and parse them as one single
block of javascript. The same problem can of course happen in other languages,
html is just an example to illustrate.

Comment 1 Marek Fukala 2007-04-02 11:16:14 UTC

The problem is not directly in the parser, but rather in the lexer, which lexes
the embeedded parts separately for each high level token. I have filled an issue
#99664 which being fixed should help.

Comment 2 Jan Jancura 2007-04-05 10:06:29 UTC

So, I am closing this issue.
We are not able to parse language splitted to more tokens.

Comment 3 Marek Fukala 2007-04-05 11:53:56 UTC

Hanzi, I still think you will have to do some changes after Mila fixes the issue
#99664. So why not to let the issue opened? It has the dependency, so you can
substract it from the overal issues number if anyone cares about bug numbers.
The same applies for the other issue you closed as wontfix. Opinion?

Comment 4 Jan Jancura 2007-04-05 12:16:26 UTC

I do not know about such changes, I thought that it will be fixed in Lexer
without any changes needed on my side. Feel free to fire a new issue after that
with some explanation what should be fixed, please.
The issue in current state is not fixable for me.

Comment 5 Marek Fukala 2007-04-05 12:34:27 UTC

Yes, the issue is not fixable now, and yes, in this simple example it will
probably work without touching your code. 

But IMO you will need to use the TokenSequenceLists to walk throught the
separated sections of the same language once Mila fixes it (issue #95569).

The case reported in this issue could probably be also fixed by fix of #87014.

Mile, can you express your expert opinion?

Comment 6 Marek Fukala 2007-04-11 13:53:24 UTC

I have changed the summary to better describe the problem.

So the main problem is that an embedded language can be split into more tokens
over the document. The high level tokens containing the embedding MAY be in a
row following each other (1) or they can be divided by other tokens (2). We need
to cover both situations. 

There is an issue #87014 - Preserve lexer state between separate blocks of
embedded language which being fixed will automatically fix the case #1.

Mila works on the 'TokenSequenceList' Lexer API extenstion which allows to
iterate over spread tokens of one language in a row - Issue #95569. Once this is
fixed, you need to use this API to fix #2.

Comment 7 Marek Fukala 2007-04-12 09:51:13 UTC

*** Issue 100642 has been marked as a duplicate of this issue. ***

Comment 8 Marek Fukala 2007-04-16 13:10:30 UTC

*** Issue 101119 has been marked as a duplicate of this issue. ***

Comment 9 Marek Fukala 2007-04-17 13:07:47 UTC

*** Issue 101273 has been marked as a duplicate of this issue. ***

Comment 10 Dan Kolar 2007-04-19 14:13:28 UTC

*** Issue 101746 has been marked as a duplicate of this issue. ***

Comment 11 Dan Kolar 2007-04-20 08:19:30 UTC

*** Issue 101861 has been marked as a duplicate of this issue. ***

Comment 12 Martin Schovanek 2007-04-20 11:21:33 UTC

It causes hight visible issues like issue 100607 - start/end tag matching,
problems with navigator, etc. And number of duplicates is big. Increasing
priority to P1.

Comment 13 Martin Schovanek 2007-04-20 11:24:39 UTC

The issues is considered as M9 stopper so please fix it ASAP.

Comment 14 Dan Kolar 2007-04-20 13:23:03 UTC

*** Issue 101938 has been marked as a duplicate of this issue. ***

Comment 15 Marek Fukala 2007-04-20 13:24:17 UTC

*** Issue 101878 has been marked as a duplicate of this issue. ***

Comment 16 Marek Fukala 2007-04-20 13:26:35 UTC

*** Issue 101938 has been marked as a duplicate of this issue. ***

Comment 17 Marek Fukala 2007-04-20 13:26:55 UTC

*** Issue 101938 has been marked as a duplicate of this issue. ***

Comment 18 Jan Jancura 2007-04-21 14:32:51 UTC

It looks like some misunderstanding. Parsing embedded sections as one piece is
RFE. It may be implemented, but it needs some prototyping. But it does not block
HTML of JSP implementation as far as I know.

1) HTML
Given usecase with script tag should be fixed in HTML lexer - html/editor
module. You should parse script body as one token. Its not possible to fix this
issue on my side. Its possible to implement html lexer/parser without any
changes in languages/engine module. That is proved by languages/html module
implementation.

2) JSP
It should be possible to implement JSP lexer/parser based on current Schliemann
engine too. Just use HTML language as top level laguage for JSP and mark all JSP
blocks as HTML whitespaces. The same technique should be used for rhtml. 

We can discuss it more on Monday.

Comment 19 Marek Fukala 2007-04-23 11:36:41 UTC

After my one-to-one meeting with Hanz, the resolution is so far:

#1 is mostly about issue #87014, and yes, I can workaround it. If Mila didn't
fix the issue, I'll do that to M9

#2 The proposed solution with switching the JSP and HTML languages in terms of
embedding would fix the html tag pairing problems, but would introduce the same
problems in JSP. So it look like a not ideal solution.

After a discusstion with Hanz, it look like the correct solution is to parser
all the embedded languages separately (all joined pieces at onece), create a
separate ASTs for them and let all the features to work on their own AST. For
features like navigator which needs to work with all AST, merge them somehow
reasonably (there seems to be some unsolvable problems with crossed tags -
folding, navigator).

The solution with more ASTs is too complicated to be done in M9, so for the
milestone, I'll try to workaround the problem by resolving the HTML AST items in
JSP's AST resolver. However, I am not sure now, if it will work.

Comment 20 Jan Jancura 2007-04-23 23:12:44 UTC

Implementation of this feature for M9 is risky, and we probably do not have
enough time anyway.

Comment 21 Jiri Skrivanek 2007-04-25 13:11:20 UTC

*** Issue 101936 has been marked as a duplicate of this issue. ***

Comment 22 Marek Fukala 2007-04-25 15:16:43 UTC

I have implemented a workaround solution of this issue for milestone 9. 

The JSP AST resolver now collects all the html pieces and joins them into one
AST and let it processed by the html AST resolver. The result is that the JSP
document's navigator contains a view of the html content of the file, JSP nodes
are not shown. The problems with incorrectly marked unpaired tags is fixed.

I'm downgrading the issue to P2 since it is not that urgent for M9 now. I expect
that if this issue is fixed properly in M10, I'll remove the workaround.

Modified:
   languages/engine/src/org/netbeans/modules/languages/parser/LLSyntaxAnalyser.java
   web/jspsyntax/src/org/netbeans/modules/web/core/syntax/JSP.java
   html/editor/src/org/netbeans/modules/html/editor/resources/HTML.nbs
   ide/golden/group-friend-packages.txt
   ide/golden/friend-packages.txt
   html/editor/src/org/netbeans/modules/html/editor/HTML.java
   html/editor/nbproject/project.xml

Log:
 A temporary M9 solution for #99526 - Parser doesn't parse separated embedded
sections of one language as one piece

Comment 23 Jan Jancura 2007-04-27 14:15:04 UTC

*** Issue 102373 has been marked as a duplicate of this issue. ***

Comment 24 Jan Jancura 2007-05-21 17:26:41 UTC

fixed in trunk:
IDE: [5/21/07 6:26 PM] Committing "Generic Languages Framework" started
Checking in modules/languages/parser/LLSyntaxAnalyser.java;
/cvs/languages/engine/src/org/netbeans/modules/languages/parser/LLSyntaxAnalyser.java,v
 <--  LLSyntaxAnalyser.java
new revision: 1.35; previous revision: 1.34
done
Checking in api/languages/ASTItem.java;
/cvs/languages/engine/src/org/netbeans/api/languages/ASTItem.java,v  <-- 
ASTItem.java
new revision: 1.4; previous revision: 1.3
done
Checking in api/languages/ASTNode.java;
/cvs/languages/engine/src/org/netbeans/api/languages/ASTNode.java,v  <-- 
ASTNode.java
new revision: 1.11; previous revision: 1.10
done
IDE: [5/21/07 6:26 PM] Committing "Generic Languages Framework" finished