This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
I opened www/www/updates/alpha/dev_1.6_.xml as text in the editor in a dev build and right before the block: <module codenamebase="org.netbeans.core.execution" distribution="http://www.netbeans.org/download/nbms/alpha/dev/core-execution_fr.nbm" license="french_l10n-nbm-license.txt" downloadsize="3492" > <l10n langcode="fr" module_major_version="1" module_spec_version="1.1" /> </module> I began typing a line: <!-- XXX: --> It was very slow; typing each character took a couple of seconds. I took a thread dump, attaching it. Also pasting an XML element (a few lines long) took several seconds. But editing of e.g. attributes in existing elements was normal speed.
Created attachment 13174 [details] Thread dump
Not sure why this happens - the editor fixes the lexer states. Anyway we should fix this into 3.6.
Are you able to reproduce it?
Upgrading to P2 to keep track of this as "want-to-fix-to-nb36"
I know now what's the bottleneck and I'm going to fix it. The problem is that during updating of the lexer state infos the text from the last updated line till the begining of the next line is retrieved by doc.getText() to be scanned by the Syntax lexer class. This happens for line where the modification happened and every line that follows it until the lexical states match (which is at the end of the comment token). In the long comment many lines has to be updated which cuases many doc.getText() operations. If the text being retrieved spans the gap in the document's character buffer (which it does because the gap is moved to the modification point) the characters must be copied into the target segment which leads to the slowness. This problem can be reproduced for long java comments as well and it exists in NB3.5 too. I will fix it by getting the text not only till the end of the next line but doubling the previous size of the text retrieved. This will fix the problem as there will be at most log2(doc.getLength()) doc.getText() operations.
I have implemented the fix that I have described. Even with comment about 5000 lines long the typing response is about 500ms on my machine while before the fix it was more than 5s. Fixed in trunk: Checking in libsrc/org/netbeans/editor/LineRootElement.java; /cvs/editor/libsrc/org/netbeans/editor/LineRootElement.java,v <-- LineRootElement.java new revision: 1.9; previous revision: 1.8 done Mato, please approve the patch. Petre F. please test and verify the fix. Thanks.
I approve the fix. Adding visual diff also for better tracking: http://editor.netbeans.org/source/browse/editor/libsrc/org/netbeans/editor/LineRootElement.java.diff?r1=1.8&r2=1.9
We've checked the situation again with Petr F. and although the typing has an acceptable performance now there is still a problem in two cases: 1) If an opening of the comment "<!--" is typed withing the time when the line spans are recomputed (starts ~4 secs after opening of the file and is usually finished withing 0-3 secs but may be slower on slower machines) then it can take even more than a minute as the recomputation involves fetching of the tokens for each lines which is slow in this case because the single comment token that gets created is ~550KB long (till the end of the file). 2) When the code completion gets invoked at the begining of the comments. Again the completion is fetching the tokens leading to fetching of the big comment token. It's clear that we cannot close this issue but there is a workaround to type it like "<-- blahblah -->" and then fill in the "!". Or paste an initial "<!-- -->" from the clipboard and then edit the contents of the comment. Another hypotetical workaround is "Write more comments" :) If there would be another comment then the Syntax lexer would stop the opened comment at the closing "-->" of the next comment. The situation will improve dramatically with the lexer module introduction because besides the fact that it remembers the created tokens (so no recreating of the tokens is necessary) it also has an optimization that allows to define "important" characters for the particular token ('<','>','!','-' in case of the xml comment) together with "guarded boundaries" at the begining and end of the token (4 and 3 in case of xml comment). If the typed character is "unimportant" and outside of the guarded boundaries then the token is considered unchanged (from the lexical point of view) and the language lexer is even not invoked at all. After I integrate the current improvement I will downgrade to P3 if there are no objections or otherwise we would need to wave this.
> After I integrate the current improvement I will downgrade to P3 if > there are no objections or otherwise we would need to wave this. I would rather leave it as P2, perhaps waive it with a commitment from the team to really fix it in promo-D (using or not using lexer api). this is a serious performance killer
Integrated into release36: Checking in libsrc/org/netbeans/editor/LineRootElement.java; /cvs/editor/libsrc/org/netbeans/editor/LineRootElement.java,v <-- LineRootElement.java new revision: 1.8.10.1; previous revision: 1.8 Visual diff: http://www.netbeans.org/source/browse/editor/libsrc/org/netbeans/editor/LineRootElement.java.diff?r1=1.8&r2=1.8.10.1
Requesting waiver for 3.6 Justification --------------------- The response time was improved already (from 5s to 500ms). There are still cases (see IZ 39446, comment from Mila 2004-03-05 05:51 PST), when the performance is not optimal and we currently don't have a short term solution for that. User Impact --------------------- Medium. It's a scalability problem, dependent on the size of the file and on the position of user comments in the file. Workarounds --------------------- 1. User can copy/paste whole comment statement <!-- --> from clipboard 2. User can write <-- --> and add ! later Long term solution ------------------------------------------ It should be solved by lexer module or additional optimization of existing code in the editor that handles tokens.
Waiver Approved
We were unable to address this issue in the promoD timeframe therefore I would like to ask for a waiver for promoD. We should be able to find a fix for promoE at least the one that would autoinsert the "-->" together with the just typed "-" after previously typed "<!-". The o.n.editor.Abbrev class could be extended and used for this purpose after some tweaking.
I have tested the situation again with recent 4.0 dev build and there are no longer any hangs e.g. the completion runs in a separate thread. So the only problem seems to be decreased responsiveness - each keystroke takes roughly one second on my machine (on older machine it can be few seconds). Added xml team on cc.
I have created a possible fix for this issue. It's a workaround when the XML comment is divided into separate tokens - one token for each line. I have tested it - seems to be safe and the performance is great. See attached diff.
Created attachment 17749 [details] The patch diff
Marku, thanks for the patch but as I've already said previously I'm not a fan of splitting of the tokens into pieces just because of workarounding of this problem. IMO the structure of the token should be retained. But as your team now maintains the xml module it's generally up to you whether you decide to integrate this or not. BTW please make sure that any utilities relying on the structure of the comment token should be updated as well if necessary (the BLOCK_COMMENT token id is used in XMLSyntaxSupport).
BTW reagrding token splitting we have discussed this with Honza Lahoda (added to cc if he wants to speak up) when this issue was discovered at the 3.6 time. I'm trying to recollect the arguments that I had against the splitting but I'm not sure whether I recall everything. Generally there can be tools relying on the comment tokens structure. For example we can write a reformatting tool that wants to e.g. reformat the comment to have 80 columns per line. For such a tool it is better to have a whole token available than pieces because the tool may need to know whether the line being reformatted is inside the comment token or at its begining. Checking for "<!--" might be a solution for xml where IIRC it's illegal to have double-dash and continue the comment but e.g. not for java where you can legally have /* block-token /* still the same token */ // end-of-block-comment so checking for "/*" at the begining of the split-comment-token does not guarantee that it's a real begining of the multi-line-block-comment-token. There could be a solution to have separate token ids BLOCK_COMMENT_START BLOCK_COMMENT_LINE BLOCK_COMMENT_END besides the BLOCK_COMMENT token id that would be used for single-line cases but having this for all multi-line-capable token ids is IMHO annoying from the usage point of view. Also regarding the future the lexer module will support language embedding i.e. a token (e.g. a javadoc comment token) will support splitting into tokens of another language (e.g. javadoc language). It is easier and more natural to do the language embedding of the whole javadoc token rather than its pieces. So from my point of view I would rather fix it in other way than token splitting. On the other hand I'm OK with such fix for a single promotion in the xml module that we maintain.
>So from my point of view I would rather fix it in other way than token >splitting. On the other hand I'm OK with such fix for a single >promotion in the xml module that we maintain. This is exactly how it is meant - just a hotfix for one release or up to it is really and clearly fixed on the editor side. Currently I am not aware of any tools relying on the 'big XML token' so I hope this is not a big issue. Moreover if there are any I belive it is less pain to update them than bother users with such a poor performance.
Hi, (I think I should comment :-)). I think that the splitting fix is desirable and quite OK. If Marek has a working patch and noone knows about anybody that has a tool depending on the structure of tokens, I see not reason why not to apply it. Maybe asking on nbdev would be good only to make sure noone depends on the "whole comment" token. On the other hand, I am not convinced that the problem of opening comment in the beggining of a 550KB file will be completely solved by the lexer (IMO it will still need to read each character of the file, so it will take linear time with respect of the length of the file).
As I've already mentioned I'm OK with the fix. To make as little confusion as possible I have filed a separate issue to track the editor's infrastructure problem with handling of typing within large comment tokens. It's issue 49296. > On the other hand, I am not convinced that the problem of opening > comment in the beggining of a 550KB file will be completely solved by > the lexer (IMO it will still need to read each character of the file, > so it will take linear time with respect of the length of the file). I'm confident that it *will* fix the problem - I mean there will be no several seconds delays between each keystroke. It should basically be fixed in two ways: 1) There will be token validators that will enable to validate the token without fully rescanning it. 2) Even if the token will be relexed it will only be done once per modification which should generally take just a fraction of second (moreover there will be no character buffers copying like it's now). I have reassigned the issue to xml - it will be fixed there.
fixed as proposed formerly - XML tokens are created for each line of the commnet. This fix also causes a SyntaxtElement to be created for each line of the comment - see XMLSyntaxSupport.createElement:277 This fix should be removed after editor solve the core of the problem in their code. Checking in XMLDefaultSyntax.java; /cvs/xml/text-edit/src/org/netbeans/modules/xml/text/syntax/XMLDefaultSyntax.java,v <-- XMLDefaultSyntax.java new revision: 1.12; previous revision: 1.11 done
Removed 4.0_WAIVER_REQUEST keyword.
Verified in development build #200507131800 of NetBeans 4.2.