This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
When a source root is being indexed, for each file, it is necessary to find out which indexer should be indexing the given file. It seems to be appropriate to base this decision primarily on mime type (further filtering may occur). The problem is how to find out the mime type for a given file. FileUtil.getMIMEType requires a FileObject and is not very fast (see below). Hardcoded map between extensions and mime type does not work very well, as the user can assign a new extension to a mime type. Please note, that it may not be required to find out the mime type exactly - if there are only indexers for text/x-java and text/x-php5, it is irrelevant if the mime type is text/xml or text/x-ant+xml, the important thing is that it is surely neither text/x-java nor text/x-php5. So, the request is to provide an ability to find out the mime type of a file very quickly. I did a quick test that scans through a few source roots and sorts the files according their extension/mime type. Implementation through j.i.File took (having everything in OS caches) 17ms, implementation through FileObject and FileUtil.getMIMEType took 155ms (subsequent attempts in the same instance of the IDE were faster, but this won't help the initial scan usecase). There were 354 .java files (no other files were present), ~147 directories. I am attaching: -module41.zip - module that adds File/TEST menu item, which will run the test on "/tmp/outgoing" (will scan through /tmp/outgoing/*/src) - there are different variants of tests in org.test.module41.TEST class. -FileVSFileObject.zip - test data, unpack into /tmp
Created attachment 63111 [details] Test module.
Created attachment 63112 [details] Test data.
This issue is very important from PHP point of view. There are many type of files, which can be preprocessed with PHP. So we are not able to predict, in which files user wants to treat as PHP. Also projects like php wiki, or php media wiki has more then 1000 files so time of initial indexing is important for us as well.
Let someone add FileUtil.getMimeType(java.io.File) that will just use the already premade CachedFileObject to guess the mime type. The CachedFileObject shall be modified to handle common operations during recognition (getExt, getName, getNameExt, getInputStream) just directly on the provided java.io.File, for the others it would really convert the File to FileObject, but as these operations are not used during mime type check, it should not matter if these are a bit slower.
I tried suggested solution with FileUtil.getMimeType(java.io.File) but it doesn't help in full IDE. Maybe it helps for PHP IDE only where less resolvers are registered. Using slightly modified test by Honza Lahoda with php wiki project (~450 php files and other files) I got 140 ms for File.getName(), 2016 ms for FileUtil.getMIMEType(FileObject) and 2234 ms for FileUtil.getMIMEType(File). It seems also other methods than (getExt, getName, getNameExt, getInputStream) are used during recognition and FileObject is created anyway. Probably it might be better to consider some filtering of acting resolvers as Honza mentioned.
Can I see results from profiler? E.g. profile it or show me the patch to try myself later.
Created attachment 63593 [details] Patch adding FileUtil.getMimeType(java.io.File).
I spent some time on this. Firstly I made various performance improvements in MIMESupport and BaseFileObj to make getMIMEType(FileObject) faster (http://hg.netbeans.org/core-main/rev/e1ae60dcb027). Times in miliseconds of resolving of 1706 various files in php wiki project in java IDE are as follows: 3 consequent runs - 9047, 7172, 7515 after improvements - 1922, 1172, 1250 It will never be as fast as using only java.io.File.getName because some resolvers read content of files. I have also tried to better implement FileUtil.getMIMEType(java.io.File) as Jarda suggested. It is faster but the difference is not very big (1328, 985, 984). I think it is not worth to integrate it. Are these performance improvements enough for you or do you still need something extra fast? You can also look at 2 snapshots from profiler for getMIMEType(FileObject) and getMIMEType(File) but I think there is not too much to be further improved.
Created attachment 66776 [details] Profiler snapshot of getMIMEType(FileObject).
Created attachment 66777 [details] Profiler snapshot for getMIMEType(File).
Created attachment 66779 [details] Patch implementing getMIMEType(File).
I am backing out #e1ae60dcb027 as it caused test failures. When changing basic infrastructure like this it is wise to run all applicable unit tests locally.
Integrated into 'main-golden', available in build *200808080201* on http://bits.netbeans.org/dev/nightly/ Changeset: http://hg.netbeans.org/main/rev/e1ae60dcb027 User: Jiri Skrivanek <jskrivanek@netbeans.org> Log: #137734 - Various performance improvemts for MIME type resolution. Namely cache extension in MIMESupport.CachedFileObject, do not call unnecessarily lastModified and override getPath. In BaseFileObj replace File.getParentFile by faster getParent and cache FileObjectFactory.
Of course I run tests locally and they passed on my WindowsXP. I will try to investigate what is wrong on unix.
I see. I can reproduce the test failures (in masterfs) on Linux.
"some resolvers read content of files" - you may skip such resolvers if you enhance the API a bit. First of all you would need to give Jan a method like: public String getMimeType(FileObject fo, Set<String> onlyFromThese); which would try to guess the mime type, and if it is not from onlyFromThese, it would return null. This might skip some of the resolvers, especially if you add new resolver constructor: protected MIMEResolver(String[] mimeTypesICanRecognize) then you do not need to talk to any sniffing resolver in case Jan is only interested in text/x-java. My 2 Kč.
Thank you Jesse for backing out. I found a problem which caused tests to fail and pushed my performance improvements again (http://hg.netbeans.org/core-main/rev/db43e97030a0). I agree with Jarda than something like FileUtil.getMimeType(FileObject fo, Set<String> onlyFromThese); is necessary to improve MIME resolving perfomance even more. Maybe better there should be MIMEResolver.providesMIMETypes() which could work also for declarative resolvers. I will look at it.
Integrated into 'main-golden', available in build *200808210201* on http://bits.netbeans.org/dev/nightly/ Changeset: http://hg.netbeans.org/main/rev/db43e97030a0 User: Jiri Skrivanek <jskrivanek@netbeans.org> Log: #137734 - Various performance improvemts for MIME type resolution. Namely cache extension in MIMESupport.CachedFileObject, do not call unnecessarily lastModified and override getPath. In BaseFileObj replace File.getParentFile by faster getParent.
As discussed in this issue I suggest the following API change. An extra parameter is added to method FileUtil.getMIMEType(FileObject, String...). Parameter contains one or more MIME types which we are only interested in. It means only MIMEResolver instances which supply at lease one of these MIME types are queried. That's why MIMEResolver abstract class was extended by method getMIMETypes(). It has to return a String array of MIME types which can be resolved by this resolver.
Created attachment 68337 [details] Patch implementing FileUtil.getMIMEType(FO, String...).
[JG01] You cannot add an abstract method to MIMEResolver. That would be grossly incompatible. It must have a concrete implementation, e.g. returning null to signify "unknown/any". [JG02] Do not use equalsIgnoreCase unless there is a compelling reason to believe that case sensitivity would in fact be a bug. [JG03] The unit test passes withinMIMETypes == null yet this usage is not permitted by the Javadoc and should throw an exception. Side notes: [JG04] The implementation in MIMEResolverImpl.java need not copy Javadoc. Javadoc is inherited and subclasses should not duplicate the Javadoc of the parent. [JG05] Do not make long, detailed entries in apichanges.xml. Javadoc is the place to precisely specify behaviors and contracts. An apichanges entry should be a brief mention of what is different and why someone might care - focus on what existing module writers might need to do to fix their code to be compatible or to take advantage of a new opportunity. [JG06] Replace * It has to return not empty String array. * @return an array of MIME types which can be resolved by this resolver. which is redundant in several ways with * @return a non-empty array of MIME types [JG07] The Javadoc for the withinMIMETypes parameter is not very explicit. "which only should be considered" does not really tell me anything. You probably meant to say something such as: "Only resolvers whose {@link MIMEResolver#getMIMETypes} contain one or more of the requested MIME types will be asked if they recognize the file. It is possible for the resulting MIME type to not be a member of this list."
Y01 Updated modules over the NetBeans code base shall require new version of openide.filesystems to use its new API Y02 String[] getMIMETypes() shall not be public. It is not supposed to be called by clients of the API Y03 To [JG01] better to return null from getMIMETypes() by default and treat is as "I do not know yet" value, imho. Y04 To add more to Y02 but with lower priority: I believe it would be better to hide getMIMETypes() completely and instead have MimeResolver(String... supportedMIMETypes) constructor. That will ensure immutability of the result (which is what all current resolvers do anyway), simplify the amount of code people need to write - e.g. just call super(...). The only exception is .../declmime/MIMEResolverImpl.java but that class is in the same module and the code in canResolveMIMETypes can handle it in special way. Moreover it will allow us to speed things up if necessary and replace the "for (int i = 0; i < desiredMIMETypes.length; i++) { for (int j = 0; j < resolvableMIMETypes.length; j++) {...}" with something more effective, if needed.
Re: [JG01] Y03 I did it intentionally to force subclasses of MIMEResolver to implement it. But assertion in MIMESupport.canResolveMIMETypes if getMIMETypes returns null is probably enough. In such a case it doesn't make sense to return any not null value. Re: Y04 Idea with constructor is not clear to me. Could you write an example? I think we can think about protected String[](or Set<String>) MIMEResolver.resolvableMIMETypes field instead of MIMEResolver.getMIMETypes() method. This field can be then initialized in constructor. I don't know whether field or method is better. Other comments fixed in new patch. Thank you.
Created attachment 68531 [details] Patch implementing FileUtil.getMIMEType(FO, String...) - version2.
Y03 - yes, that sounds the same as JG01. Adding an assertion in case a MIMEResolver does not specify an explicit MIME type list is just as incompatible as adding an abstract method, and it is unacceptable in such a foundational class. You need to deal with old resolvers somehow, probably by always checking them. A one-line WARNING printed to log (once per session x resolver impl) is acceptable. I like Y04. JG07 would still apply - must be specified exactly what this means, e.g. whether the resolver is ever permitted to return a type which is not in the list. Be careful with varargs - since there is already a zero-arg constructor, having a MIMEResolver(String...) introduces a potential ambiguity: is super() a call to MIMEResolver() or to MIMEResolver(new String[0])? A slightly ugly but effective trick is to declare the constructor MIMEResolver(String mimeType1, String... otherMimeTypes) which also enforces that the list is not empty.
Why should I deal with old resolvers? They should be fixed immediately to not slow dow IDE. My change in FileUtil is anyway binary incompatible and modules using FileUtil.getMIMEType(fo) has to be recompiled. JG07 I thought your sentence "It is possible for the resulting MIME type to not be a member of this list." clearly says that the resolver IS permitted to return a type which is not in the list. Y04 - I tried to implement in patch 3. Also I tried super() and if I explicitely add default MIMEResolver constructor, super() calls it.
Created attachment 68551 [details] Patch implementing FileUtil.getMIMEType(FO, String...) - version 3.
JG01 - the API must continue to deal with any old resolvers floating around on the internet. An incompatible change here is not acceptable. You cannot fix all of them, though you can and should fix all of them you find in nb.org modules. (Don't forget contrib - though it can only be updated after the API change is in main.) Nor may you replace FileUtil.getMIMEType(FO) with gMT(FO, String...) as this is binary-incompatible. (Even modules that were recompiled would be broken: passing String[0] would logically mean that the return value must always be null!) You may only add a new overload of the method. JG07 - I am just asking for the exact specification to be documented. In the most recent patch, in the new MIMEResolver constructor, it is not documented. The Javadoc says only "MIME types which can be resolved by this resolver". It does not clearly specify whether findMIMEType is permitted to return some other MIME type, and if it is not, what (if anything) will happen if it tries. The getMIMEType Javadoc has been updated satisfactorily. Y04 is not yet reflected in apichanges.xml JG08 - {@link #getMIMEType(FileObject, String[]) getMIMEType(FileObject)} makes no sense. Simplify to {@link #getMIMEType(FileObject, String[]). JG09 - use Parameters.notNull in getMIMEType. JG10 - the MIMEResolver(String...) constructor does not check for a null or empty array, nor for nulls in the array.
I updated patch according to your comments. Also I changed assert in canResolveMIMETypes to warning as you suggested sooner.
Created attachment 68634 [details] Patch implementing FileUtil.getMIMEType(FO, String...) - version 4.
[JG11] apichanges.xml claims the change is incompatible, which I hope it no longer is. [JG12] You could make the no-arg variant of the MIMEResolver constructor @Deprecated to help alert module developers to the desirability of specifying MIME types. [JG13] The comment //the MIMEResolver(String...) constructor does not check for a null or //empty array, nor for nulls in the array. is strange since the code directly above it does exactly that. [JG14] The warning should be printed in the MIMEResolver constructor, not in canResolveMIMETypes.
Patch is updated. Re: [JG14] The warning cannot be printed in the MIMEResolver() constructor because declarative resolvers are created this way. BTW, in contrib the following will be updated: contrib/fortress.editing/src/org/netbeans/modules/fortress/editing/FortressMimeResolver.java contrib/scala.editing/src/org/netbeans/modules/scala/editing/ScalaMimeResolver.java
Created attachment 68655 [details] Patch implementing FileUtil.getMIMEType(FO, String...) - version 5.
JG14 - declarative resolvers could use a different, private constructor which does not warn. E.g. @Deprecated public MIMEResolver() { this(false); warn(....this.getClass().getName()...); } MIMEResolver(boolean ignored) {} public MIMEResolver(String... mimeTypes) { this(false); ... } Anyway the essential point is that the warning should be printed at most once per MIMEResolver implementation class per session, not once per resolver per call to getMIMEType, which would quickly fill up the log file with spam.
Re: JG14 - There is a condition which fulfils the requirement that warning is printed just once per MIMEResolver instance which is in fact once per session. I think MIMEResolver class should not become overcomplicated.
I will integrate suggested patch.
Fixed. http://hg.netbeans.org/core-main/rev/81c36fefab56
Integrated into 'main-golden', will be available in build *200809040201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main/rev/81c36fefab56 User: Jiri Skrivanek <jskrivanek@netbeans.org> Log: #137734 - To speed up MIME type recognition it is added method FileUtil.getMIMEType(FileObject, String...). We can supply one or more MIME types which we are only interested in. Module writers have to override MIMEResolver default constructor and provide supported MIME types.
Integrated into 'main-golden', will be available in build *200809050201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main/rev/1ca8bd7dc952 User: Jesse Glick <jglick@netbeans.org> Log: Added note to <compatibility> for #137734.