Forum Discussion

flipa_29928's avatar
flipa_29928
Icon for Nimbostratus rankNimbostratus
Mar 08, 2011

The F5 Monitoring Service Data Source Singleton Error

Hi All,

 

 

Recently installed the F5 Management Pack as per the WIKI guide (RMS first then the Management Server to be used for F5 discovery/monitoring).

 

 

Today we attempted our first discovery of an F5 device and as we wanted to confirm whether or not the big3d agent might be upgraded and possibly cause some impact we chose to discover the standby node of our F5 load balancers.

 

 

One of the F5 Administrators used his account named account which is a member of the Administrator role on the F5 appliances to perform the discovery task on the management server.

 

 

The discovery completed successfully and eventually the device appeared in the appropriate F5 Management Pack state view with a green tick.

 

 

I then noticed, however, an alert in the SCOM Console's Active Alerts view relating to the F5 Management Pack Monitoring Service on the designated Management Server that the discovery was performed on.

 

 

If I expand the Health Explorer for the F5 Monitoring Service on this Management Server I can see the following roll-ups are Critical:

 

 

+Entity Health

 

+Availability

 

+Data Layer

 

+Operations Manager Connections: - F5.MonitoringService (F5 management Pack Monitoring Service)

 

+Data Source Server - F5.MonitoringService (F5 Management Pack Monitoring Service)

 

+Operations Manager Connector - F5.MonitoringService (F5 Management Pack Monitoring Service)

 

 

Clicking on the bottom two unit monitors to see what Knowledge text they have shows nothing and the State Change history tab is as follows:

 

 

+Data Source Server - F5.MonitoringService (F5 Management Pack Monitoring Service)

 

Description: The PerformanceDataSourceConnector connection to Operations Manager Health Service host localhost was lost: Failed to connect to an

 

IPC Port: The system cannot find the file specified.

 

 

+Operations Manager Connector - F5.MonitoringService (F5 Management Pack Monitoring Service)

 

Description: The PerformanceDataSourceConnector connection to Operations Manager Health Service host localhost was lost: Failed to connect to an

 

IPC Port: The system cannot find the file specified.

 

 

In the Active Alerts view there is also the error named, "F5 Monitoring Service data Source Singleton error" and the Path of this error is the Management Server from which the discovery task was run. The Description text for this error states: "The F5 Monitoring Service data source been loaded multiple times. "

 

 

The Knowledge text state:

 

 

Summary

 

The F5 Monitoring Service data source been loaded multiple times.

 

 

Configuration

 

Configuration of threshold and performance data rule overrides is part of Operations Manager override interface.

 

 

Causes

 

This most frequently occurs when rule overrides for thresholds or performance data collection are targeting incorrect objects.

 

 

Resolutions

 

Please attempt to fix or reset your rule overrides for thresholds and performance rules.

 

 

Additional Information For additional information about the F5 Management Pack rule overrides, refer to the Management Pack Rule Override Wiki at: http://devcentral.f5.com/wiki/default.aspx/MgmtPack/PerformanceCollectionAndMonitoring.html

 

 

External Knowledge Sources

 

External knowledge sources. http://devcentral.f5.com/mpack

 

 

At this stage no additional configuration has been done to any rules or collectors for performance or availability monitoring. The only thing that has ben done is to discover the device and let whatever default object discovery tasks are enabled to run to see what gets populated. Our plan is to turn-on any desired performance and object discoveries as required once we get the Active node into the configuration.

 

 

However, before we get to that I would like to understand and fix this f5 Monitoring Service data Source Singleton error.

 

 

So, if anyone can explain the meaning of this error and how it can be resolved I would bemost grateful. Is the error just the F5 Management Pack's way of saying we need to configure overrides if we expect to get any meaningful monitoring of the devices? Do I simply need to restart the F5 Monitoring Service on the Management Server resolve this?

 

 

Thanking you for your considered reply,

 

 

flipa

20 Replies

  • Julian_Balog_34's avatar
    Julian_Balog_34
    Historic F5 Account
    Michael,

    Definitely the account running the F5 Monitoring Service has to have read/write access to the F5_ManagementPack SQL database. One simple way you would set this up is to use the SQL Server Management Studio and suggested as follows. I'll use a generic name to point to your account running the F5 Monitoring Service e.g. 'f5acct'.

    - Under the SQL Server instance hosting the F5_ManagementPack :: Security :: Logins :: look for the 'f5acct' account :: is there?

    - if yes :: select 'f5acct' :: right click :: Properties :: User Mapping :: check/enable F5_ManagementPack :: check/enable 'db_owner' and 'public' for the related user roles.

    - if not :: select Logins :: right click :: New Login :: enter login name :: Search 'f5acct' and point to it :: select 'User Mapping' :: check/enable 'F5_ManagementPack' :: check/enable 'db_owner' and 'public' for the related user roles.

    The DBAs in your SQL Server environment may choose to further restrict this access to just 'read/write', and in this case the approach would be slightly different, but bottom line is, the F5 Monitoring Service account needs to have read/write access to the F5_ManagementPack database.

    Also, make sure the SQL Service Broker is enabled for the OperationsManager SQL database. You can do this through the following SQL command.

    
    SELECT is_broker_enabled FROM sys.databases WHERE name = 'OperationsManager';

    In summary, according to Microsoft: the SQL Service Broker enables internal or external processes to send and receive guaranteed asynchronous messaging by using extensions to Transact-SQL. In distributed SCOM management server environments this service needs to be enabled.

    Julian

  • Hi Julian,

     

     

    I have added the F5 Monitoring service account to the SQL Server and set the User Mappings as per your description.

     

     

    I also ran the SQL query for verifying the status of the SQL Broker service and it returned "1" which I believe means it is running.

     

     

    I then went on to the MAnagement Server and re-started the F5 Monitoring Service but it stopped within 20 or so seconds.

     

     

    I then re-entered the credentials for the Service in the Log On tab and applied these and got a pop-up message advising the Service has been granted log-on as a service rights.

     

     

    Re-started the service but as before it soon stopped and refuses to remain in a running state, which meakes me wonder if the reason it does not remain started is that the System Center Management service is already running?

     

     

    Does the F5 Monitoring Service need to be running before the System Center Management service to allow the F5 Monitoring Service to start correctly?

     

     

    Regards,

     

     

    Michael

     

  • Julian_Balog_34's avatar
    Julian_Balog_34
    Historic F5 Account
    The F5 Monitoring Service depends on the SCOM Health Service, so this means that the SCOM Health Service needs to be up and running for the F5 Monitoring Service to start.

     

    The reason why the F5 Monitoring Service starts and then eventually would stop is because it cannot connect to the SCOM Health Service. But the SCOM Health Service is obviously running, so it's either a permission issue of having the communication channel open between the F5 Monitoring Service and the SCOM Health Service, or the F5 data sources are not loaded inside the SCOM Health Service.

     

     

    What I'd like to check next, is to make sure the F5 data sources and workflows are loaded and actually running inside the SCOM Health Service. What we've checked previously was to look for 'Failed Rules and Monitors'. So the task you'll be lookinf for to run is "Show Running Rules and Monitors for this Health Service":

     

     

    You can run the task from SCOM Management Console :: Monitoring :: Operations Manager :: Management Server :: Management Server Health State :: select your management server :: Health Service Tasks (in the Action pane) :: Show Running Rules and Monitors for this Health Service.

     

     

    Make sure you see the F5 Monitors and rules loaded and running. If not, than we may have a problem with the actual SCOM F5 Management Pack not being correctly imported in SCOM.

     

     

    Let me know.

     

    Julian
  • Hi Julian,

     

     

    Ran the Show Running Rules and Monitors for this Health Service task against the designated F% Management Server but the task failed with the following error:

     

     

    "Task execution error Error Code: -2130771883 (Unknown error (0x80ff0055))."

     

     

    When I tried running the same task against another Management Server that is used primarily for Windows Agent monitoring the task completed successfully and I can see a list of running Rules and Monitors so it looks like the Health Service on the F5 deisgnated Management Server is not behaving correctly.

     

     

    By way of context I should mention that the F5 Managament Server is also used for SNMP Network Device monitoring and Web Application monitoring workflows. While there are quite a number of these I do not see the processor or memory utilization on this Management Server as being overly busy.

     

     

    I also compared what account is being used\by the System Center Management (SCOM) service on both Management Servers and they are identical so all things being equal I don't think it is a question of the System Center Management Service on the F5 Management Server having been configured with the incorrect logon credentials.

     

     

    I suspect that rebuilding the Health Service Store for the F5 Management Server by stopping the System Center Management Server service, then renaming the Health Service Store folder in the SCOM installation directory and restarting the service thereby rebuilding the entire Health Service Store contents may correct the issue but that is not something I would do lightly unless absolutely necessary and I would need to obtain proper change approval for the outage to monitoring that it necesiitates.

     

     

    I will also try and search online for any hints as to what the above error code might mean and if I get nowehere will submit a support case with MS Support to address the issue with the failing Health Service Task but in the meantime if you have any other diagnostic suggestions I would be grateful for them.

     

     

    Thanks again for your assistance.

     

     

    flipa

     

     

  • Julian_Balog_34's avatar
    Julian_Balog_34
    Historic F5 Account
    Hi flipa,

     

     

    I'd probably do the same, trying to flush the cached health-state of the entire SCOM health model. You mentioned that the "System Center Management (SCOM) service" is configured on both Management Servers with the same account. Are you referring to the SCOM Health Service here (System Center Management service)? Is the SCOM Health Service configured to run with a different account than 'LocalSystem'? If that's the case, I would definitely change the account to LocalSystem (on both Management Servers). Is there a specific reason why you would run the Health Service with a different account than the default (and preferred) LocalSystem account?

     

     

    If you'd like to also clean the cached health state of the SCOM Health Service, probably the least disruptive (and safest) way would be to the following:

     

     

    - stop the System Center services (SDK, Health, Config),

     

    - clear the Health Service state (deleting the files in the \System Center Operations Manager 2007\Health Service State\Health Service Store folder)

     

    - clear the Management Pack cache (deleting the files in the \System Center Operations Manager 2007\Health Service State\Management Packs folder)

     

    - clear the Health Service connector configuration cache (deleting the OpsMgrConnector.Config.xml file in \System Center Operations Manager 2007\Health Service State\Connector Configuration Cache\)

     

    - clear the SDK Service state (deleting the MomAuth.xml file in \System Center Operations Manager 2007\SDK Service State).

     

     

    I usually try to stay away from this radical procedure, but in cases like yours it's probably worth pursuing in order to start a clean slate. (All of the files deleted will be reconstructed based on the OperationsManager database, upon restarting the SCOM services).

     

     

    Julian

     

     

  • Hi Julain,

     

     

    To clarify, the management server are using Local System.

     

     

    I have recreated the Health service Store folder only as I want to use the lightest touch approach and this seems to have helped the situation as the F5MonitoringService does not stop anymore.

     

     

    I was also able to run the "Show Running Rules and Monitors for this Health Service" task and I will try running this again now that the f5Monitoringservice is stable and send you the results at the managementpack(at)f5(dot)com address.

     

     

    This means that we are essentially back to where we were prior to attempting the discovery of the active device in that soon after the F5 monitoring service was restarted Event ID 401 and 806 were logged in the F5 Event Log as follows:

     

     

    EVENT ID 401

     

    The PerformanceDataSourceConnector connection to Operations Manager Health Service host localhost was lost: Failed to write to an IPC Port: The pipe is being closed.

     

     

    EVENT ID 806

     

    Unable to process device [F5 Device [xxx.xxx.xxx.xxx]] statistics due to data failure: The PerformanceDataSourceConnector connection to Operations Manager Health Service host HealthService could not be established: Failed to connect to an IPC Port: The system cannot find the file specified.

     

    : HealthService

     

     

     

    The 806 event repeats at 1 minute intervals with only the IP address of the referenced f5 device alternating between the two nodes and aftyer a while the 401 event will re-appear and so and so.

     

     

    Looking at the Health Roll-up of the F5 management Server the f5 Monitoring Service monitor is now healthy since the service remains started which leaves just the Data Source Server -F5MonitoringService and Operations Manager Connector -f5monitoringService unit monitors as still critical.

     

     

    Can you please clarify what the IPC Port that the event log errors refer to is? A colleague hazarded a guess at the Inter-Process Communication service but I don't want to assume.

     

     

    It may be that by some quirk we are not running this service in our environment or that it is locked-down such that the F5MonitoringService is hampered but I can't be sure. All I know is that I do not see anything named Inter-Process Communication service in the services applet on the Management server so maybe we are missing something basic there?

     

     

    I'll try and forward the results of the Show running tasks as soon as I can.

     

     

    Regards,

     

     

    Michael
  • Hi Julian,

     

     

    Unfortunately it looks like a new issue has cropped-up in that the Mangement Server now loses contact with the RMS as it turns grey (including all SNMP and web application monitoring workflows it performs too) which is not a workable situation.

     

     

    I will try disabling the F5 Monitoring service and restarting the Healthservice to see if the Healthservice turns grey even when the F5 monitoring service is stopped.

     

     

    If it remains stable I would have to suspect that the F5 Monitoring workload (such as it is with no customization but continuous failures to write data) is having a negative impact on the HealthService.

     

     

    Can you please clarify what F%'s position is regarding hosting the F% workflows on a Management Server that is also performing other SCOM monitoring tasks? Does the F5 MP require a dedicated Management Server?

     

     

    Regards,

     

     

    Michael
  • Julian_Balog_34's avatar
    Julian_Balog_34
    Historic F5 Account
    Hi Michael,

     

     

    To first answer your question about the IPC (inter-process communication) error, the F5 Monitoring Service communicates with the SCOM Health Service through a named-pipe (IPC) connection. Named pipes can be regarded as file handles. In our case, the IPC channel between the F5 Monitoring Service and the SCOM Health Service is strictly local (i.e. the processes are running on the same host), so you may think of it like this: the F5 Management Pack datasource (running the workflows) is loaded in the SCOM Health Service and opens up a "listener" (server) IPC channel, which the F5 Monitoring Service subscribes to. The IPC channel has its own security context, based on the authentication token of the F5 Monitoring Service. If the F5 Monitoring Service cannot connect to the SCOM Health Service through this IPC named-pipe, it basically means that the F5 Monitoring Service cannot write to a file "owned" by the SCOM Health Service. And the most obvious cause would be the security context of the I/O operation being denied. The other possible cause, if we rule out the security context, is that probably the F5 datasource is not loaded in the SCOM Health Service and non one is listening to the F5 Monitoring Service. Do you actually see the F5 workflows running? (You said you ran the "Show Running Rules and Monitors for this Health Service" task).

     

     

    Regarding your second question, about the F5 MP requiring a dedicated Management Server, the answer is: you need to have the F5 MP installed on the RMS, at least. You said you have a distributed management server SCOM environment. So you have to install the F5 MP on the RMS, which is mandatory. If you decide to monitor the F5 devices from the RMS that's fine, but you'll not be following Microsoft's best practices. To avoid the load on the RMS, you would install the F5 MP on a secondary management server (not really dedicated) and you can run the F5 device discovery and monitoring from there. This management server can run other agents and workflows as well. It doesn't need to be 'dedicated' from the F5 MP's perspective.

     

     

    Let me know if your F5 MP workflows are running.

     

     

    Thanks!

     

    Julian
  • Hi Julian,

     

     

    Thank you for clarifying the role of the IPC connection in all this.

     

     

    Alas, when I managed to run the Show Running Rules and Monitors task it was with the F5MonitoringService stopped and I did not keep a copy of the output as I figured it was not going to show anything of use.

     

     

    But after starting the F5MonitoringService and having the various Healthservice not healthy events that turned the Management Server and all its workflows grey I find that once again running the task fails with the same cryptic error code.

     

     

    So, I think I may need to need to stop the System Center Management service to kill off any Healthservice process and manually kill off any remaining MonitoringHost process in Task Manager, then start the System Center Management Service, wait half an hour or so for all workloads to initialize and the server to return to its usual CPU/memory activity, then try running the task again just to verify if it was the Healthservice alone that is having the issue.

     

     

    If the task runs successfully, I will then start the F5MonitoringService and try running the task again though based on observered behaviour so far I expect that within a few minutes of the F5MonitoringService being started I will start seeing "Rules Unloaded" type events and then the memory usage will drop signalling workflows are not running and eventually the Healthservice will turn grey.

     

     

    But I will try it and let you know.

     

     

    Regarding the second aspect of your reply, the distributed configuration you described is what we have. I installed the F5 Management Pack on the RMS first as per the guide instaructions, then on the secondary Management Server which is also tasked with the SNMP and web application monitoring workflows. I am glad to hear that it is not a required configuration to have the F5 workflows hosted on a dedicated management Server, though I suspect we may end-up going that route as I think the additional load on the Management Server's physical memory (once we actually get the Management Pack working and enable performance collections) may well take the server close to its limit.

     

     

    Regards,

     

     

    Michael

     

     

  • Hi Julian,

     

     

    Apologies for not updating this topic.

     

     

    I have had to focus on other operational tasks which have prevented me from dealing with this issue which is still not correct.

     

     

    The only thing I can report is that the F5 monitoring service still is not able to make an IPC connection. The last time I tried to stop the SCOM Agent and start the F5 Montiroing service it appeared to work but then I realized that all the SNMP monitoring worfkows were no longer working on the management server as though the SNMP service was obstructed. The only reason I new this was happening is that I have some heartbeat type alerts against SNMP devices configured and these stopped after the last attempt at re-startiong the F5 Monitoring service so I am reluctant to keep trying to stop/start these unless there is a definite troubleshooting aim in doing so.

     

     

    Could you please get in touch with me again to pick-up where we left off.

     

     

    The way things stabd now, I fear that I may have to uninstall the F5 MP entirely and start from scratch possibly with a separate fresh Management Server that is perforing no other monitoring work in order to better compare what is happening. But even so, there are aspects of the installation/configuration procedure regarding the accounts required and access levels which are ambiguous in the documentation which I would appreciate having assistance with.

     

     

    Kind Regards,

     

     

    Michael